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Preface 


Read This First 


This user’s guide serves as a reference book for the TMS320C3x generation 
of digital signal processors, which includes the TMS320C30, TMS320C31, 
TMS320LC31 and TMS320C32. Throughout the book, all references to ’C3x 
refer collectively to ‘C30, ’C31, and ’C32 and the TMS320C30, TMS320C31, 
and TMS320C32 refer to all speed variations unless an exception is noted. 
This document provides information to assist managers and hardware/soft- 
ware engineers in application development. 


Specifically, this book complements the TMS320C3x User’s Guide by provid- 
ing information to assist you in application development. It includes example 
code and hardware connections for various appliances. 


This guide presents examples of frequently used applications and discusses 
more involved examples and applications. It also defines the principles in- 
volved in many applications and gives the corresponding assembly language 
code for instructional purposes and for immediate use. Whenever a detailed 
explanation of the underlying theory is too extensive to be included in this 
manual, appropriate references are given for further information. 


Notational Conventions 


Notational Conventions 


This document uses the following conventions: 


L 


Program listings, program examples, and interactive displays are shown 
ina special typeface thatis similar to that of a typewriter. Examples 
use Abold version of the special typeface for emphasis. Interactive 
displays use abold version of the special typeface to distinguish com- 
mands that you enter from items that the system displays (such as 
prompts, command output, error messages, etc.). 


The following is a sample program listing: 


0011 0005 0001 .field 12 
0012 0005 0003 -field 3, 4 
0013 0005 0006 -field by 3 
0014 0006 .even 


The following is an example of a system prompt and acommand you might 
enter: 


C: esr -a /user/ti/simuboard/utilities 


Any string within angle brackets is considered to be a variable. In syntax 
descriptions, the variable is written in a typeface similar to that of the text. 
The following is an example of a variable syntax: 


<file name> Path name of a UNIX file 
<signal> Name ofa signal 


In syntax descriptions, the instruction, command, or directive is in a bold 
typeface font and parameters are in an italic typeface. Portions of a syntax 
that are in bold should be entered as shown below. Portions of a syntax 
that are in italics describe the type of information that should be entered. 
The following is an example of a directive syntax: 


sasect "section name”, address 


In the preceding example, “.asect’ is the directive. This directive has two 
parameters, indicated by section name and address. When you use 
“asect,” the first parameter must be an actual section name, enclosed in 
double quotes; the second parameter must be an address. 


Square brackets ( [ and ] ) identify an optional parameter. If you use an 
optional parameter, you must specify the information within the brackets; 
you must not enter the brackets themselves. The following is an example 
of an instruction that has an optional parameter: 


LALK 16-bit constant [, shift] 


Notational Conventions 


The LALK instruction has two parameters. The first parameter, 16-bit con- 
stant, is required. The second parameter, shift, is optional. As this syntax 
shows, if you use the optional second parameter, you must precede it with 
acomma. 


Square brackets are also used as part of the pathname specification for 
VMS pathnames. In this case, the brackets are actually part of the path- 
name (they are not optional). 


Lj In assembler syntax statements, column 1 is reserved for the first char- 
acter of a label or symbol. If the label or symbol is optional, it is usually not 
shown. If it is a required parameter, it is shown starting against the left 
margin of the shaded box, as in the example below. No instruction, com- 
mand, directive, or parameter, other than a symbol or label, can begin in 
column 1. 


symbol .usect ”section name”, size in bytes [, alignment] 


The symbolis required for the .usect directive and must begin in column 1. 
The section name must be enclosed in quotes and the parameter size in 
bytes must be separated from the section name by a comma. The align- 
ment is optional and, if used, must be separated by a comma. 


(1 Braces( {and} ) indicate alist. The symbol | (read as or) separates items 
within the list. The following is an example of a list: 


{ * | a4 | x } 
This provides three choices: *, *+, or *-. 


Unless the list is enclosed in square brackets, you must choose one item 
from the list. 


1 Some directives can have a varying number of parameters. For example, 
the .byte directive can have up to 100 parameters. The syntax for this 
directive is: 


-byte value; [, ... , value,] 


Note that .byte does not begin 
in column one. 


This syntax shows that .byte must have at least one value parameter, but 
you have the option of supplying additional value parameters, each sepa- 
rated from the previous one by a comma. 
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Information About Cautions and Warnings 


Information About Cautions and Warnings 


This book may contain cautions and warnings. 


This is an example of a caution statement. 


A caution statement describes a situation that could potentially 
damage your software or equipment. 


This is an example of a warning statement. 


A warning statement describes a situation that could potentially 
cause harm to you. 


The information in a caution or a warning is provided for your protection. 
Please read each caution and warning carefully. 
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Related Documentation From Texas Instruments 


Related Documentation From Texas Instruments 


The following books describe the TMS320 floating-point devices and related 
support tools. To obtain a copy of any of these Tl documents, call the Texas 
Instruments Literature Response Center at (800) 477-8924. When ordering, 
please identify the book by its title and literature number. 


JTAG/MPSD Emulation Technical Reference (literature number SPDU079) 
provides the design requirements of the XDS510™ emulator controller, 
discusses JTAG designs (based on the IEEE 1149.1 standard), and 
modular port scan device (MPSD) designs. 


Setting Up TMS320 DSP Interrupts in C Application Report (literature 
number SPRAO36) describes methods of setting up interrupts for the 
TMS320 family of processors in C programming language. Sample code 
segments are provided, along with complete examples of how to set up 
interrupt vectors. 


TLC32040C, TLC32040I, TLC32041C, TLC32041I Analog Interface 
Circuits 
(literature number SLASO14E) data sheet contains the electrical and 
timing specifications for these devices, as well as signal descriptions and 
pinouts for all of the available packages. 


TMS320C3x/C4x Assembly Language Tools User’s Guide (literature num- 
ber SPRU035) describes the assembly language tools (assembler, link- 
er, and other tools used to develop assembly language code), assembler 
directives, macros, common object file format, and symbolic debugging 
directives for the ’C3x and ’C4x generations of devices. 


TMS320C3x/C4x Code Generation Tools Getting Started Guide (literature 
number SPRU119) describes how to install the TMS320C3x/C4x 
assembly language tools and the C compiler. Installation instructions are 
included for MS—DOS™, Windows 3.x, Windows NT, Windows 95, 
SunOS™, Solaris, and HP—UX™ systems. 


TMS320C3x/C4x Optimizing C Compiler User’s Guide (literature number 
SPRU034) describes the TMS320 floating-point C compiler. This C com- 
piler accepts ANSI standard C source code and produces TMS320 as- 
sembly language source code for the ’C3x and ’C4x generations of de- 
vices. 

TMS320C3x C Source Debugger (literature number SPRU053) describes 
the ’C3x debugger for the emulator, evaluation module, and simulator. 
This book discusses various aspects of the debugger interface, including 
window management, command entry, code execution, data manage- 
ment, and breakpoints. It also includes a tutorial that introduces basic de- 
bugger functionality. 
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TMS320C3x/C4x Assembly Language Tools User’s Guide (literature 
number SPRUO35) describes the assembly language tools (assembler, 
linker, and other tools used to develop assembly language code), 
assembler directives, macros, common object file format, and symbolic 
debugging directives for the ’C3x and ’C4x generations of devices. 


TMS320C3x User’s Guide (literature number SPRU031) describes the ’C3x 
32-bit floating-point microprocessor (developed for digital signal proces- 
sing as well as general applications), its architecture, internal register 
structure, instruction set, pipeline, specifications, and DMA and serial 
port operation. Software and hardware applications are included. 


TMS320C3x/C4x Code Generation Tools Getting Started Guide (literature 
number SPRU119) describes how to install the TMS320C3x/C4x 
assembly language tools and the C compiler. Installation instructions are 
included for MS—DOS™, Windows 3.x, Windows NT, Windows 95, 
SunOS™, Solaris, and HP—UX™ systems. 


TMS320C30 Digital Signal Processor (literature number SPRS032A) data 
sheet contains the electrical and timing specifications for this device, as 
well as signal descriptions and pinouts for all of the available packages. 


TMS320C31, TMS320LC31 Digital Signal Processors (literature number 
SPRS035) data sheet contains the electrical and timing specifications for 
these devices, as well as signal descriptions and pinouts for all of the 
available packages. 


TMS320C32 Digital Signal Processor (literature number SPRS027C) data 
sheet contains the electrical and timing specifications for this device, as 
well as signal descriptions and pinouts for all of the available packages. 


TMS320 DSP Development Support Reference Guide (literature number 
SPRUO011) describes the TMS320 family of digital signal processors and 
the tools that support these devices. Included are code-generation tools 
(compilers, assemblers, linkers, etc.) and system integration and debug 
tools (simulators, emulators, evaluation modules, etc.). Also covered are 
available documentation, seminars, the university program, and factory 
repair and exchange. 


TMS320 Family Development Support Reference Guide (literature number 
SPRUO11E) describes the TMS320 family of digital signal processors 
and the various products that support it. This includes code-generation 
tools (compilers, assemblers, linkers, etc.) and system integration and 
debug tools (simulators, emulators, evaluation modules, etc.). This book 
also lists related documentation, outlines seminars and the university 
program, and provides factory repair and exchange information. 
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TMS320 Third-Party Support Reference Guide (literature number 
SPRU052C) alphabetically lists over 100 third parties who supply vari- 
ous products that serve the family of TMS320 digital signal processors, 
including software and hardware development tools, speech recogni- 
tion, image processing, noise cancellation, modems, etc. 


The publications in the following reference list contain useful information re- 
garding functions, operations, and applications of digital signal processing 
(DSP). These books also provide other references to many useful technical 
papers. The reference list is organized into categories of general DSP, speech, 
image processing, and digital control theory and is alphabetized by author. 


Lj General Digital Signal Processing: 


Antoniou, Andreas, Digital Filters: Analysis and Design. New York, NY: 
McGraw-Hill Company, Inc., 1979. 


Bateman, A., and Yates, W., Digital Signal Processing Design. Salt Lake 
City, Utah: W. H. Freeman and Company, 1990. 


Brigham, E. Oran, The Fast Fourier Transform. Englewood Cliffs, NJ: 
Prentice-Hall, Inc., 1974. 


Burrus, C.S., and Parks, T.W., DFT/FFT and Convolution Algorithms. New 
York, NY: John Wiley and Sons, Inc., 1984. 


Chassaing, R., and Horning, D., Digital Signal Processing with the 
TMS320C25. New York, NY: John Wiley and Sons, Inc., 1990. 


Digital Signal Processing Applications with the TMS320 Family, Vol. I. 
Texas Instruments, 1986; Prentice-Hall, Inc., 1987. 


Digital Signal Processing Applications with the TMS320 Family, Vol. II. 
Texas Instruments, 1990; Prentice-Hall, Inc., 1990. 


Digital Signal Processing Applications with the TMS320 Family, Vol. III. 
Texas Instruments, 1990; Prentice-Hall, Inc., 1990. 


Gold, Bernard, and Rader, C.M., Digital Processing of Signals. New York, 
NY: McGraw-Hill Company, Inc., 1969. 


Hamming, R.W., Digital Filters. Englewood Cliffs, NJ: Prentice-Hall, Inc., 
1977. 


Hutchins, B., and Parks, T., A Digital Signal Processing Laboratory Using 
the TMS320C25. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1990. 


IEEE ASSP DSP Committee (Editor), Programs for Digital Signal 
Processing. New York, NY: IEEE Press, 1979. 


Read This First ix 


References 
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Using the TMS32010. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1987. 


Lim, Jae, and Oppenheim, Alan V. (Editors), Advanced Topics in Signal 
Processing. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1988. 


Morris, L. Robert, Digital Signal Processing Software. Ottawa, Canada: 
Carleton University, 1983. 


Oppenheim, Alan V. (Editor), Applications of Digital Signal Processing. 
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Oppenheim, Alan V., and Schafer, R.W., Discrete-Time Signal Process- 
ing. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1989. 


Oppenheim, Alan V., and Willsky, A.N., with Young, I.T., Signals and 
Systems. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1983. 


Parks, T.W., and Burrus, C.S., Digital Filter Design. New York, NY: John 
Wiley and Sons, Inc., 1987. 
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IEEE Transform on ASSP, June 1987. 


Treichler, J.R., Johnson, Jr., C.R., and Larimore, M.G., Theory and Design 
of Adaptive Filters. New York, NY: John Wiley and Sons, Inc., 1987. 


Speech: 


Gray, A.H., and Markel, J.D., Linear Prediction of Speech. New York, NY: 
Springer-Verlag, 1976. 

Jayant, N.S., and Noll, Peter, Digital Coding of Waveforms. Englewood 
Cliffs, NJ: Prentice-Hall, Inc., 1984. 


Papamichalis, Panos, Practical Approaches to Speech Coding. Engle- 
wood Cliffs, NJ: Prentice-Hall, Inc., 1987. 


Parsons, Thomas., Voice and Speech Processing. New York, NY: 
McGraw Hill Company, Inc., 1987. 


Rabiner, Lawrence R., and Schafer, R.W., Digital Processing of Speech 
Signals. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1978. 


Shaughnessy, Douglas., Speech Communication. Reading, MA: 
Addison-Wesley, 1987. 
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Image Processing: 


Andrews, H.C., and Hunt, B.R., Digital Image Restoration. Englewood 
Cliffs, NJ: Prentice-Hall, Inc., 1977. 


Gonzales, Rafael C., and Wintz, Paul, Digital Image Processing. Reading, 
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Multirate DSP: 


Crochiere, R.E., and Rabiner, L.R., Multirate Digital Signal Processing. 
Englewood Cliffs, NJ: Prentice-Hall, Inc., 1983. 


Vaidyanathan, P.P., Multirate Systems and Filter Banks. Englewood Cliffs, 
NJ: Prentice-Hall, Inc. 


Digital Control Theory: 


Dote, Y., Servo Motor and Motion Control Using Digital Signal Processors. 
Englewood Cliffs, NJ: Prentice-Hall, Inc., 1990. 


Jacquot, R., Modern Digital Control Systems. New York, NY: Marcel Dek- 
ker, Inc., 1981. 


Katz, P., Digital Control Using Microprocessors. Englewood Cliffs, NJ: 
Prentice-Hall, Inc., 1981. 


Kuo, B.C., Digital Control Systems. New York, NY: Holt, Reinholt and 
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Moroney, P., /ssues in the Implementation of Digital Feedback Compensa- 
tors. Cambridge, MA: The MIT Press, 1983. 


Phillips, C., and Nagle, H., Digital Control System Analysis and Design. 
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Adaptive Signal Processing: 


Haykin, S., Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice-Hall, 
Inc., 1991. 


Widrow, B., and Stearns, S.D. Adaptive Signal Processing. Englewood 
Cliffs, NJ: Prentice-Hall, Inc., 1985. 


Array Signal Processing: 


Haykin, S., Justice, J.H., Owsley, N.L., Yen, J.L., and Kak, A.C. Array 
Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1985. 


Hudson, J.E. Adaptive Array Principles. New York, NY: John Wiley and 
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ABEL is a trademark of DATA I/O. 


CodeView, MS, MS-DOS, MS-Windows, and Presentation Manager are registered trademarks of 
Microsoft Corporation. 


DEC, Digital DX, Ultrix, VAX, and VMS are trademarks of Digital Equipment Corporation. 
HPGL is registered trademark of Hewlett Packard Company. 
Macintosh and MPW are trademarks of Apple Computer Corp. 


Micro Channel, OS/2, PC-DOS, and PGA are trademarks of International Business Machines Corpora- 
tion. 


SPARC, Sun 3, Sun 4, Sun Workstation, SunView, and SunWindows are trademarks of Sun Microsys- 
tems, Inc. 


UNIX is a registered trademark in the United States and other countries, licensed exclusively through 
X/Open Company Limited. 
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If You Need Assistance 


If You Need Assistance .. . 


— World-Wide Web Sites 
TI Online http://www.ti.com 
Semiconductor Product Information Center (PIC) —_http://www.ti.com/sc/docs/pic/home.htm 
DSP Solutions http://www.ti.com/dsps 
320 Hotline On-line™ http://www.ti.com/sc/docs/dsps/support.htm 


North America, South America, Central America 


Product Information Center (PIC) (972) 644-5580 
TI Literature Response Center U.S.A. (800) 477-8924 
Software Registration/Upgrades (214) 638-0333 Fax: (214) 638-7742 

U.S.A. Factory Repair/Hardware Upgrades (281) 274-2285 

U.S. Technical Training Organization (972) 644-5580 

DSP Hotline (281) 274-2320 Fax: (281) 274-2324 Email: dsph@ti.com 
DSP Modem BBS (281) 274-2323 

DSP Internet BBS via anonymous ftp to ftp://ftp.ti.com/pub/tms320bbs 


Europe, Middle East, Africa 
European Product Information Center (EPIC) Hotlines: 
Multi-Language Support +33 130 70 11 69 : +33 130701032 Email: epic@ti.com 
Deutsch +49 8161 80 33 11 or +33 1 30 70 11 68 
English +33 1307011 65 
Francais +33 13070 11 64 
Italiano +33 1 3070 11 67 
EPIC Modem BBS +33 13070 11 99 
European Factory Repair +33 4 93 22 25 40 
Europe Customer Training Helpline : +49 81 61 80 40 10 


Asia-Pacific 

Literature Response Center +852 2956 7288 Fax: +852 2 956 2200 
Hong Kong DSP Hotline +852 2956 7268 Fax: +852 2 956 1002 
Korea DSP Hotline +82 2551 2804 =Fax: +82 2551 2828 
Korea DSP Modem BBS +82 2551 2914 

Singapore DSP Hotline Fax: +65 390 7179 
Taiwan DSP Hotline +886 23771450 Fax: +886 2 377 2718 
Taiwan DSP Modem BBS +886 2 376 2592 

Taiwan DSP Internet BBS via anonymous ftp to ftp://dsp.ee.tit.edu.tw/pub/TI/ 


Japan 

Product Information Center +0120-81-0026 (in Japan) Fax: +0120-81-0036 (in Japan) 
+03-3457-0972 or (INTL) 813-3457-0972 Fax: +03-3457-1259 or (INTL) 813-3457-1259 

DSP Hotline +03-3769-8735 or (INTL) 813-3769-8735 Fax: +03-3457-7071 or (INTL) 813-3457-7071 

DSP BBS via Nifty-Serve Type “Go TIASP” 


Documentation 


When making suggestions or reporting errors in documentation, please include the following information that is on the title 
page: the full title of the book, the publication date, and the literature number. 
Mail: Texas Instruments Incorporated Email: dspbh@ti.com 
Technical Documentation Services, MS 702 
P.O. Box 1443 
Houston, Texas 77251-1443 


Note: When calling a Literature Response Center to order documentation, please specify the literature number of the 
book. 
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Chapter 1 


Processor Initialization 


Before you execute a DSP algorithm, you must initialize the processor. Initializa- 
tion brings the processor to a known state. Generally, this occurs anytime after 
the processor is reset. This chapter reviews the concepts of processor initializa- 
tion explained in the user’s guide and provides examples. 


Topic Page 
LEIP RRESCT PROCESS iarercterregerereteteleteteveretetetaretststetetelareietetatetelarnictetarersaetsetelatersier 1-2 
j- 2a ResetioignaliGeneratiOniercca- cient eerie tiem titer terry 1-3 
1.3 How to Initialize the Processor ........... ccc cece eee eee 1-4 
1.4 Low-Power Mode Interrupt .......... 2.0 c cece cence ence eee 1-9 


Reset Process 


1.1 Reset Process 


You can reset the processor by applying a low level to the RESET input for atleast 
ten H; cycles. The ’C3x terminates execution and puts the reset vector (the 
contents of memory location 0) in the program counter. The reset vector nor- 
mally contains the address of the system-initialization routine. The hardware 
reset also initializes various registers and status bits. 


In order to reset the ’C3x correctly, you need to comply with several hardware 
and software requirements: 


L] If the C31 or ’C32 is in microcomputer mode, set the INTx pins (as dis- 
cussed in Using the TMS320C31 and TMS320C32 Boot Loaders chapter 
of the TMS320C3x User’s Guide) so that the boot loader works properly. 


(1 Provide the correct reset vector value; the reset vector normally contains 
the address of the system initialization routine. 


m Inmicrocomputer mode, the reset vector is initialized automatically by 
the processor to point to the beginning of the on-chip boot loader code. 
No user action is required. 


m In microprocessor mode, the reset vector is typically stored in an 
EPROM. Example 1—1 on page 1-5 shows how you can initialize that 
vector. 


C1 Apply a low level to the RESET input (see section 1.2). 


Reset Signal Generation 


1.2 Reset Signal Generation 


The reset input controls the initialization of internal ’C3x logic and also causes 
the execution of the system initialization software. For proper system initializa- 
tion, the reset signal must be applied for at least ten H1 cycles, that is, 600 ns 
for a’C8x operating at 33.33 MHz. Upon power up, however, it can take 20 ms 
or more before the system oscillator reaches a stable operating state. There- 
fore, the power-up reset circuit should generate a low pulse on the reset line for 
100 to 200 ms. Once a proper reset pulse has been applied, the processor 
fetches the reset vector from location 0, which contains the address of the system 
initialization routine, Figure 1-1 shows a circuit that generates an appropriate 
power-up reset circuit. 


Figure 1-1. Reset Circuit 


*C3x 
RESET 
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1.3. How to Initialize the Processor 


After reset, the 'C3x jumps to the address stored in the reset vector location 
and starts execution from that point. The reset vector normally contains the ad- 
dress of the system initialization routine. 


The initialization routine typically performs several tasks: 


Sets the data-page pointer (DP) register 
Sets the stack pointer 

Sets the interrupt vector table 

Sets the trap vector table 

Sets the external memory control register 
Clears/enables cache 


UOOUUOCU 


Note: 


When running under microcomputer mode (MCBL/MP=1), the on-chip boot- 
loader automatically initializes the external memory-control register values 


from the bootloader table. 
[A | 


The ’C3x can be initialized using assembly language or C. 


1.3.1. Processor Initialization Under Assembly Language 


If you are running under an assembly-only environment, Example 1-1 on 
page 1-5 provides a basic initialization routine. This example shows code for 
initializing the ’C3x to the following machine state: 


All interrupts are enabled. 

The overflow mode is disabled. 

The program cache is enabled. 

The DP register is initialized to 0. 

The memory-mapped control registers are initialized. 
The internal memory is filled with Os. 
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Example 1—1. TMS320C3x Processor Initialization 


x TITLE PROCESSOR INITIALIZATION 


-global RESET, INIT, BEGIN 
-global INTO, INT1, INT2, INT3 
-global ISRO, ISR1, 1ISR2,1ISR3 
-global DINT, DMA 
-global TINTO, TINT1, XINTO, RINTO, XINT1, RINT1 
-global TIMEO, TIME1, XMTO, RCVO, XMT1,RCV1 
-global TRAPO, TRAP1, TRAP2, TRPO, TRP1, TRP2 


* 
* PROCESSOR INITIALIZATION FOR THE TMS320C3x 
* 
as RESET AND INTERRUPT VECTOR SPECIFICATION. THIS 
s ARRANGEMENT ASSUMES THAT DURING LINKING, THE FOLLOWING 
* TEXT SEGMENT WILL BE PLACED TO START AT MEMORY 
* LOCATION 0. 
* 
sect “init” ; Named section 

RESET .word INIT ; RSt load address INIT to PC 
INTO .word ISRO ; INTO loads address ISRO to PC 
INT1 .word ISR1 ; INTIt loads address ISR1 to PC 
INT2 .word ISR2 ; INT2t loads address ISR2 to PC 
INT3 .word ISR3 ;  INT34 loads address ISR3 to PC 
XINTO .word XMTO 7 Serial port 0 transmit interrupt processing 
RINTO .word RCVO ; Serial port 0 receive interrupt processing 
XINT1L .word XMT1 7 Serial port 1 transmit interrupt processing 
RINT1 .word RCV1 ; Serial port 1 receive interrupt processing 
TINTO .word TIMEO ; Timer 0 interrupt processing 
TINT1 .word TIME1 ; Timer 1 interrupt processing 
DINT -word DMA ; DMA interrupt processing 

- space 20 ; Reserved space 
TRAPO .word TRPO 7 Trap 0 vector processing begins 
TRAP1 .word TRP1 7 Trap 1 vector processing begins 
TRAP2 .word TRP2 7 Trap 2 vector processing begins 

.-space 29 ; Leave space for the other 29 traps 


UO 


IN THE FOLLOWING SECTION, CONSTANTS THAT CANNOT BE REPRESENTE 
IN THE SHORT FORMAT ARE INITIALIZED. THE NUMBERS IN PARENTHESES 
AT THE END OF EACH COMMENT REPRESENT THE OFFSET OF THE 

REGISTER FROM 808000H (CTRL) 


+ + + F 
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Example 1-1. TMS320C3x Processor Initialization (Continued) 


IS APPLIED, 


-data 
MASK -word OFFFFFFFFH 
BLKO -word 0809800 
BLK1 -word 0809C00 
STCK -word O809F00 
CTRL -word 0808000 
DMACTL -word 0000000 
TIMOCTL -word 0000000 
TIM1CTL -word 0000000 
SERGLOBO .word 0000000 
SERPRTXO .word 0000000 
SERPRTRO .word 0000000 
SERTIMO -word 0000000 
SERGLOB1L .word 0000000 
SERPRTX1 .word 0000000 
SERPRTR1L .word 0000000 
SERTIM1 -word 0000000 
PARINT -word 0000000 
IOINT -word 0000000 
* 
~Cext 


Beginning address of RAM block 0 
Beginning address of RAM block 1 
Beginning of stack 

Pointer for peripheraltbus memory map 


Init for DMA control (0) 

Init of timer 0 control (32) 

Init of timer 1 control (48) 

Init of serial 0 glbl control (64) 
Init of serial O xmt port control (66) 
Init of serial 0 rev port control (67) 
Init of serial O timer control (68) 
Init of serial 1 glbl control (80) 
Init of serial 1 xmt port control (82) 
Init of serial 1 rev port control (83) 
Init of serial 1 timer control (84) 
Init of parallel interface control (100) 
Init of I/O interface control (96) 


* 

* THE ADDRESS AT MEMORY LOCATION 0 DIRECTS EXECUTION TO BEGIN HERE 
* FOR RESET PROCESSING THAT INITIALIZES THE PROCESSOR. WHEN RESET 
* 
* 


THE FOLLOWING REGISTERS ARE INITIALIZED TO 0: 


* ST -- CPU STATUS REGISTER 

* TE --— CPU/DMA INIERRUPT ENABLE FLAGS 

* IF --— CPU INTERRUPT FLAGS 

*  IOF--— I/O FLAGS 

* 

* THE STATUS REGISTER HAS THE FOLLOWING ARRANGEMENT: 

* BITS: 31-14 13 12 11 10 9 8 7 6 5 43 2 1 0 


INIT LDP 0,DP ; 
LDI 1800H,ST ; 
LDI @MASK,IE ; 


LDI @BLKO, ARO 
LDI @BLK1,AR1 
LDF 0.0,R0 
RPTS 1023 
STF RO, *ARO++ (1) 

|| STF RO, *AR1++ (1) 


i. FUNCTION: RESRV GIE CC CE 


CF RESRV RM OVM LUF LV UF N 242 V C 


Point the DP register to page 0 


Clear and enable cache, 
Unmask all interrupts 


and disable OVM 


INTERNAL DATA MEMORY INITIALIZATION TO FLOATING POINT 0 


ARO points to block 0 

AR1 points to block 1 

0 register RO 

Repeat 1024 times 

Zero out location in RAM block 0 and 
Zero out location in RAM block 1 
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Example 1—1. TMS320C3x Processor Initialization (Continued) 


* 
* THE PROCESSOR IS INITIALIZED. THE REMAINING APPLICATION- 
* DEPENDENT PART OF THE SYSTEM (BOTH ON- AND OFF-CHIP) SHOULD 
* NOW BE INITIALIZED. 
* 
* FIRST, INITIALIZE THE CONTROL REGISTERS. IN THIS EXAMPLE, 
* EVERYTHING IS INITIALIZED TO 0, SINCE THE ACTUAL INITIALIZATION IS 
* _APPLICATION-DEPENDENT. 
* 
LDI @CTRL, ARO Hi Load in ARO the pointer to control 
* H registers 
LDI @DMACTL, RO 
STI RO, *+ARO (0) : Init DMA control 
LDI @TIMOCTL, RO 
STI RO, *+ARO (32) ; Init timer 0 control 
LDI @TIMICTL, RO 
STI RO, *+ARO (48) H Init timer 1 control 
LDI @SERGLOBO, RO 
STI RO, *+ARO (64) F Init serial 0 global control 
LDI @SERPRTXO, RO 
STI RO, *+ARO (66) : Init serial 0 xmt control 
LDI @SERPRTRO, RO 
STI RO, *+ARO (67) ; Init serial 0 rev control 
LDI @SERTIMO, RO 
STI RO, *+ARO (68) ; Init serial 0 timer control 
LDI @SERGLOB1, RO 
STI RO, *+ARO (80) : Init serial 1 global control 
LDI @SERPRTX1, RO 
STI RO, *+ARO (82) . Init serial 1 xmt control 
LDI @SERPRTRI1, RO 
STI RO, *+ARO (83) ; Init serial 1 rev control 
LDI @SERTIM1,RO 
STI RO, *+ARO (84) : Init serial 1 timer control 
LDI @PARINT, RO 
STI RO, *+ARO (100) ; Init parallel interface 
; control (C30 only) 
LDI @IOINT, RO 
STI RO, *+ARO (96) ; Init I/O interface control 
* 
LDI @STCK, SP : Init the stack pointer 
OR 2000H, ST ; Global interrupt enable 
* 
BR BEGIN ; Branch to the beginning of application 
.end 
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1.3.2 Processor Initialization Under C Language 


If you are running under a C environment, your initialization routine is typically 
boot.asm (from the RTS30.LIB library that comes with the floating-point com- 
piler). In addition to initializing global variables, boot.asm initializes the DP reg- 
ister (pointing to the .bss section) and the stack pointer (SP) register (pointing 
to the .stack section). You must enable the cache, as shown in Example 1-2, 
and set up your interrupts inside your main routine before you enable inter- 
rupts. See the application report, Setting Up TMS320 DSP Interrupts in C, for 
more information. 


Example 1-2. Enabling the Cache 


main () 

{ 

asm(” or 1800,st”) ; enable cache 

/* asm(” or 3800,st”) */ ; enable cache and interrupts 


} 


Low-Power Mode Interrupt 


1.4 Low-Power Mode Interrupt 


This section explains how to generate interrupts when the IDLE2 power-down 
mode is used. 


The execution of the IDLE2 instruction causes the H1 and H3 processor clocks 
to be held at a constant level until the occurrence of an external interrupt. To 
use the IDLE2 power management feature effectively, interrupts must be gen- 
erated with or without the presence of the H1 clock. For normal (non-IDLE2) 
operation, however, the interrupt inputs must be synchronized with the falling 
edge of the H1 clock. An interrupt must satisfy the following conditions: 


_j It must meet the setup time on the falling edge of H1. 
_j It must be at least one cycle and less than two cycles in duration. 


For an interrupt to be recognized during IDLE2 operation and to turn the clocks 
back on, it must first be held low for one H1 cycle. The logic in Figure 1-2 can 
be used to generate an interrupt signal to the ’C3x with the correct timing dur- 
ing non-IDLE2 and IDLE2 operation. Figure 1-2 shows the interrupt circuit, 
which uses a 16R4 programmable logic device (PLD) to generate the ap- 
propriate interrupt signal. 


Figure 1-2. Interrupt Generation Circuit for Use With IDLE2 Operation 


’C3x TIBPAL16R4 


Interrupt 


INTx |< puree Pe 12 


H1 > CLK 


Example 1-3 shows the PLD equations for the 16R4 using the ABEL™ lan- 
guage. This implementation makes the following assumptions regarding the 
interrupt source: 


(j The interrupt source is a low-going pulse or a falling edge. If the interrupt 
source stays active for more than one H1 cycle, it is regarded as the same 


interrupt request and not a new one. 


(1 The interrupt source is at least one H1 cycle in duration. One H1 cycle is 
required to turn the H1 clock on again. 
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The interrupt is driven active as soon as the interrupt source goes active. It 
goes inactive again on detection of two H8 rising edges. These two rising 
edges ensure that the interrupt is recognized during normal operation and af- 
ter the end of IDLE2 operation (when the clocks turn on again). The interrupt 
goes inactive after the two H3 clocks are counted and does not go inactive 
again until after the interrupt source again goes inactive and returns to active. 


Example 1-3. State Machine and Equations for the Interrupt Generation 16R4 PLD 


MODULE INTERRUPT_GENERATION 
TITLE’ INTERRUPT_GENERATION FOR IDLE2 AND NON-IDLE2 TMS320C31A 


TMS320C31’ 
c3xu5 device ’P16R4’; 


“inputs 
h3. Pin 1; 
intsrc_Pin 2; "Interrupt source 


“output 
intx_ Pin 12; "Interrupt input signal to the TMS320C31 


sync_src_Pin 14; "Internal signal used to synchronize the 
“input to the H1 clock 

same_ Pin 15; "Keeps track if the new interrupt source 
"has occurred. If active, no new interrupt 
"has occurred. 

"This logic makes the following assumptions: 

"The duration of the interrupt source is at least one Hl 

"cycle in duration. It takes one Hl cycle to turn the H1 

clock on again. 


” 


ws 


[The interrupt source is pulse- or level-triggered. If the 
"source stays active after being asserted, it is regarded 
"as the same interrupt request and not a new one. 


"Name Substitutions for Test Vectors and Equations 


GyHy LyX = «Crp dy 0, M7 
source = !intsrc_; 

sync = !sync_src_; 
samesrc= !same_; 

c3xint = !intx_; 


"state bits 
outstate = [samesrc, sync]; 


idle = *b00; 
sync_st= “*b01;”synchronize state 
wait “b10;”wait for interrupt source to go inactive 


state_diagram outstate* 


Low-Power Mode Interrupt 


Example 1—3.State Machine and Equations for the Interrupt Generation 16R4 PLD 


(Continued) 
state idle: 
if (source) then sync_st 
else idle; 


state sync_st: 
if (source) then wait 
else idle; 


state wait: 
if (source) then wait 
else idle; 


equations 
lintx_ = (source # sync) & !samesrc; 


@page 
"Test interrupt generation logic 


test_vectors 
(fhe, source] -> [outstate,c3xint]) 


c, L -> idle, L J; “check start from idle 
ly H => idle, H J; “test normal interrupt operation 
Cc; H => sync_st, H 1; 
@, i -—> idle, L ile 
oye i -—> idle, L l3 
L, H -> idle, H J; “test coming out of idle2 operation 
L, H => idle, H diz 
ic; H => sync_st, H 13 
ee, Li -> idle, L liz 
c, 4H => sync_st, H J]; “test same sourc 
@, Hf -> [wait, L i 
cy, HH -—> [wait, Li ile 
c, L -> idle, L 13 
L, H -> idle, H J; “test idle2 operation 
Ly # => idle, H Ji3 
L, 4H = idle, H i? 
end interrupt_generation 
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Program Control 


This chapter discusses a group of ’C3x instructions that provide program control 
and facilitate all types of high-speed processing. These instructions handle: 


Lj Regular calls 

LJ Software stack 

Lj Interrupts 

_j Delayed branches 

LJ] Single- and multiple-instruction loops without any overhead 
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Subroutines 


2.1 Subroutines 


2-2 


The 'C3x has a 24-bit program counter (PC) and a practically unlimited soft- 
ware stack. The CALL and CALLcond instructions cause the stack pointer to 
increment and store the contents of the next value of the program counter on 
the stack. At the end of the subroutine, the RETScond instruction performs a 
conditional return. 


Example 2-1 illustrates how to use a subroutine to determine the dot product 
between two vectors. Given two vectors of length N, represented by the arrays 
a [0], a [1],..., a[N—1] and b [0], b [1],..., b [N —1], the dot product is computed 
from the expression 


d=a[0]b [0] +a[1]b [1] +... +a[N—1] b [N -1] 


Processing proceeds in the main routine to the point at which the dot product 
is to be computed. It is assumed that the arguments of the subroutine have been 
appropriately initialized. At this point, a CALL is made to the subroutine, transfer- 
ring control to that section of the program memory for execution, then returning 
to the calling routine through the RETS instruction when execution has com- 
pleted. For Example 2-1, it would suffice to save only register R2. However, 
many registers are saved for demonstration purposes. The saved registers are 
stored on the system stack. This stack must be large enough to accommodate 
the maximum anticipated storage requirements. You can use other methods of 
saving registers, also. 


Example 2—1. Subroutine Call (Dot Product) 


Subroutines 


* 

* TITLE SUBROUTINE CALL (DOT PRODUCT) 

* 

* 

* MAIN ROUTINE THAT CALLS THE SUBROUTINE ‘DOT’ TO COMPUTE THE 
* DOT PRODUCT OF TWO VECTORS 

* 

* 

* . 

* LDI @b1k0, ARO : ARO points to vector a 

* LDI @b1k1,AR1 A AR1 points to vector b 

* LDI N,RC H RC contains the number of elements 
* CALL DOT 

* 

* 

* 

* 

*  SUBROUTINI DOT 

* 

* 

* EQUATION: d = a(0) * b(0) + a(1) * b(1) + + a(Nt1) * b(N#1) 
* 

* THE DOT PRODUCT OF a AND b IS PLACED IN REGISTER RO. N MUST 
* BE GREATER THAN OR EQUAL TO 2. 

* 

* ARGUMENT ASSIGNMENTS: 

* ARGUMENT | FUNCTION 

* + = = = 

* ARO | ADDRESS OF a(0) 

* ARI | ADDRESS OF b(0) 

* RC | LENGTH OF VECTORS (N) 

* 

* REGISTERS USED AS INPUT: ARO, AR1, RC 

* REGISTER MODIFIED: RO 

* REGISTER CONTAINING RESULT: RO 

* 

* 

* 


Save status register 

Use the stack to save R2’s 
Lower 32 and upper 32 bits 
Save ARO 

Save AR1 

Save RC 
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Subroutine Call (Dot Product) (Continued) 


MPYF3 
LDF 
SUBI 


*ARO, *AR1,RO 


0 
2 


* DOT PRODUCT 


-0,R2 


7,RC 


(1 <= i < N) 


RPTS RC 
MPYF3 *++AR0O(1),*++AR1(1),RO 
| | ADDF3 RO,R2,R2 
* 
ADDF3 RO,R2,R0 
* 
RETURN SEQUENCE 
POP RC 
POP AR1 
POP ARO 
POPF R2 
POP R2 
POP ST 
RETS 
* 
e end 
* 
end 


’ 


’ 


ra 


Initialize RO: 
a(0) * b(0) +> RO 
Initialize R2 

Set RC = Nt2 


Setup the repeat single 


a(i) * b(i) +> RO 
a(itl)*b(it1) + R2 +> R2 


; a(Ntl)*b(Nt1) + R2 +> RO 


Restore RC 

Restore AR1 

Restore ARO 

Restore top 32 bits of R2 
Restore bottom 32 bits of R2 
Restore ST 

Return 
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2.2 Stacks and Queues 


2.2.1 


The 'C3x provides a dedicated stack pointer (SP) register for building stacks 
in memory. Also, the auxiliary registers can be used to build user stacks and 
a variety of more general linear lists. This section discusses the implementa- 
tion of the following types of linear lists: 


Stack A linear list for which all insertions and deletions are made 
at one end of the list 


Queue A linear list for which all insertions are made at one end of 
the list, and all deletions are made at the other end. 


Dequeue A double-ended queue for which insertions and deletions 
are made at either end of the list. 


System Stacks 


A stack in the ’C3x fills from a low-memory address to a high-memory address, 
as shown in Figure 2—1. Asystem stack stores addresses and data during sub- 
routine calls, traps, and interrupts. 


Figure 2-1. System Stack Configuration 


Bottom of stack 


SP —> Top of stack 


(Free) 


High memory 


The stack pointer is a 32-bit register that contains the address of the top of the 
system stack. The SP always points to the last element pushed onto the stack. 
A push performs a preincrement, and a pop performs a postdecrement of the 
SP. Make provisions to accommodate your software’s anticipated storage re- 
quirements. 


The stack pointer can be read from as well as written to; multiple stacks can 
be created by updating the SP. The SP is not initialized by the hardware during 
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2.2.2 User Stacks 
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reset; itis important to remember to initialize its value so that it points to a pre- 
determined memory location. Example 1—1 on page 1-5 shows how to initial- 
ize the SP. You must initialize the stack to a valid free memory space. Other- 
wise, use of the stack can corrupt data or program memory. 


The program counter is pushed onto the system stack on subroutine calls, 
traps, and interrupts. Itis popped from the system stack on returns. The PUSH, 
POP, PUSHF, and POPF instructions push and pop the system stack. The 
stack can be used inside subroutines for temporary storage of registers, as in 
Example 2—1 on page 2-3. 


Two instructions, PUSHF and POPF, are for floating-point numbers. These 
instructions can pop and push floating-point numbers to registers RO-R7. This 
feature is very useful for saving the extended-precision registers (see 
Example 2-1 and Example 2-2). PUSH saves the lower 32 bits of an 
extended-precision register, and PUSHF saves the upper 32 bits. To recover 
this extended-precision number, execute a POPF followed by POP. It is 
important to perform the integer and floating-point PUSH and POP in the 
above order, since POPF forces the last eight bits of the extended-precision 
registers to 0. 


User stacks can be built to store data from low-to-high memory or from high-to- 
low memory. Two cases for each type of stack are shown. You can build stacks 
by using the preincrement/decrement and postincrement/decrement modes 
of modifying the auxiliary registers (AR). 


You can implement stack growth from high to low memory in two ways: 


1) Store to memory using *—ARn to push data onto the stack and read from 
memory using *ARn++ to pop data off the stack. 


2) Store to memory using *ARn— to push data onto the stack and read from 
memory using *++ARn to pop data off the stack. 


Figure 2—2 illustrates these two cases. The only difference is that in 
Figure 2—2 (a), the AR always points to the top of the stack, and in 
Figure 2—2 (b), the AR always points to the next free location on the stack. 


Stacks and Queues 


Figure 2-2. Implementations of High-to-Low Memory Stacks 


(a) Store to memory using *-ARn and (b) Store to memory using *ARn- and 


read from memory using *ARn++ read from memory using *++ARn 
Low memory Low memory 
(Free) ARn—-> (Free) 
ARn—-» Top of stack Top of stack 
Bottom of stack Bottom of stack 
High memory High memory 


You can implement stack growth from low to high memory in two ways: 


1) Store to memory using *++ARnto push data onto the stack and read from 
memory using *~ARn— to pop data off the stack. 


2) Store to memory using *~ARn++ to push data onto the stack and read from 
memory using *—ARn to pop data off the stack. 


Figure 2-39 illustrates these two cases. In Figure 2—3 (a), the AR always points 
to the top of the stack, and in Figure 2—3 (b), the AR always points to the next 
free location on the stack. 


Figure 2-3. Implementations of Low-to-High Memory Stacks 


(a) Store to memory using *++ARn and (b) Store to memory using *ARn++ and 


read from memory using *ARn-— read from memory using *-ARn 
Low memory Low memory 
Bottom of stack Bottom of stack 
ARn—»> Top of stack Top of stack 
(Free) ARn—»> (Free) 
High memory High memory 
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2.2.3 Queues and Double-Ended Queues 
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The implementation of queues and double-ended queues is based on the ma- 
nipulation of the auxiliary registers for user stacks. 


For queues, two auxiliary registers are used: one to mark the front of the queue 
from which data is popped and the other to mark the rear of the queue to where 
data is pushed. 


For double-ended queues, two auxiliary registers are also necessary. One 
register marks one end of the double-ended queue, and the other register 
marks the other end. Data can be popped from or pushed onto either end. 


Interrupt Service Routines 


2.3 Interrupt Service Routines 


Interrupts on the ’C3x are prioritized and vectored. When an interrupt occurs, 
the corresponding flag is set in the interrupt flag (IF) register. If the correspond- 
ing bit in the interrupt enable (IE) register is set and interrupts are enabled by 
having the global interrupt enable (GIE) bit in the status register set to 1, interrupt 
processing begins. You can also write to the IF register, allowing you to force 
an interrupt by software or to clear interrupts without processing them. 


2.3.1. Correct Interrupt Programming 


For interrupts to work properly you must execute the following sequence of 
steps, as shown in Example 1-1: 


1) Create and place an interrupt-vector table in the appropriate memory 
location. 
) Initialize the ITTP bit field (C32 only). 
) Create a software stack. 
4) Enable the specific interrupt. 
) Enable global interrupts. 
) Generate the interrupt signal. 


2.3.2 Software Polling of Interrupts 


The interrupt flag register can be polled and action can be taken, depending 
on whether an interrupt has occurred. This is true even when maskable inter- 
rupts are disabled. This can be useful when an interrupt-driven interface is not 
implemented. Example 2—2 shows the case in which a subroutine is called 
when external interrupt 1 has not occurred. 


Example 2-2. Use of Interrupts for Software Polling 


* TITLE INTERRUPT POLLING 


TSTB 40H, IF , Test if interrupt 1 has occurred 
CALLZ SUBROUTINE ; If not, call subroutine 


When interrupt processing begins, the program counter (PC) is pushed onto the 
stack, and the interrupt vector is loaded into the PC. Interrupts are then disabled 
by clearing the GIE bit to 0, and the program continues from the address loaded 
in the PC. Since all interrupts are disabled, interrupt processing can proceed 
without further interruption, unless the interrupt service routine reenables inter- 
rupts. 
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2.3.3 Interrupt Priority 


Interrupts on the 'C3x are automatically prioritized. This allows interrupts that 
occur simultaneously to be serviced in a predefined order. Infrequent (but 
lengthy) interrupt service routines (ISRs) might need to be interrupted by more 
frequently occurring interrupts. In Example 2—3, the ISR for INT2 temporarily 
modifies the IE register to permit interrupt processing when an interrupt to 
INTO (but no other interrupt) occurs. When the routine finishes processing, the 
IE register is restored to its original state. The RETIcond instruction not only 
pops the next program counter address from the stack, but also sets the GIE 
bit of the status register. This enables all interrupts that have their interrupt en- 
able bit set. 


Example 2-3. Interrupt Service Routine 


TITLE INTERRUPT SERVICE ROUTIN 
x -global ISR2 
ENABLE .set 2000 
MASK -set 1 


ERRUPT PROCESSING FOR ERNAL INTERRUPT INT2+ 


USH ; Save status register 

USH ; Save data page pointer 

USH I ; Save interrupt enable register 
USH RO ; Save lower 32 bits and 

USHEF RO ; upper 32 bits of RO 

USH R1 ; Save lower 32 bits and 

USHF Rl ; upper 32 bits of R1 

DI MASK, IE 7 Unmask only INTO 

R ENABLE, ST ; Enable all interrupts 


P 
P 
P 
P 
P 
P 
iz 
L 
©) 


* 


* MAIN PROCESSING SECTION FOR ISR2 


XOR ENABL ; Disable all interrupts 

POPF R1 ; Restore upper 32 bits and 

POP R1 ; lower 32 bits of Rl 

POPF RO ; Restore upper 32 bits and 

POP RO ; lower 32 bits of RO 

I] ; Restore interrupt enable register 
Restore data page register 
Restore status register 


Return and enable interrupts 
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2.4 Context Switching in Interrupts and Subroutines 


Context switching is commonly required during the processing of subroutine 
calls or interrupts. It can be extensive or simple, depending on system require- 
ments. On the ’C3x, the program counter is automatically pushed onto the 
stack. Important information in other ’C3x registers, such as the status, auxilia- 
ry, or extended-precision registers, must be saved by special commands. To 
preserve the state of the status register, push it first and pop it last. This keeps 
the restoration of the extended-precision registers from affecting the status 
register. 


Example 2-4 on page 2-13 and Example 2—5 on page 2-15 show saving and 
restoring the context of the ’C3x. In both examples, the stack expands towards 
higher addresses and is used for saving the registers. If you do not want to use 
the stack pointed at by SP, you can create a separate stack by using an auxilia- 
ry register as the stack pointer. Registers saved in these examples are: 


Extended-precision registers (R7 through RO) 
Auxiliary registers (AR7 through ARO) 
Data-page pointer (DP) 

Index registers (IRO and IR1) 

Block-size register (BK) 

Status register (ST) 

Interrupt-related registers (IE and IF) 

I/O flag (IOF) 

Repeat-related registers (RS, RE, and RC) 


DOUUOUUUOUU 


You must preserve only the registers that are modified inside of your subrou- 
tine or interrupt/trap service routine and that could potentially affect the pre- 
vious context environment. 
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If the previous context environment was in C, then your program must perform 
one of two tasks: 


(1 Ifthe program is in a subroutine, it must preserve the dedicated C registers 


as follows: 
Save as Integers Save as Floating-Point 
R4 RS R6 R7 
AR4 AR5 
AR6 AR7 
FP DP (small model only) 


SP 


Lj Ifthe program is in an interrupt service routine, it must preserve all of the 
’C8x registers (see Example 2-6 on page 2-17). 


If the previous context environment was in assembly language, you must de- 
termine which registers to save, based on the operations of your assembly- 
language code. 


Note: 


The status register must be saved first and restored last to preserve the proc- 
essor status without further change caused by other context-switching in- 


structions. 
as 3! 
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Example 2-4. Context Save for the TMS320C3x 


* 


TITLE CONTEXT SAVE 


CONTEXT SAVE 


SAVE: 


-global 


SAVE 


SAV 


SAV. 


FOR THE 


Save status register 


TMS320C3x 


EGISTERS 


Save the ] 


lower 32 bi 


and the upper 32 


Save the ] 


lower 32 bi 


and the upper 32 


Save the ] 


lower 32 bi 


and the upper 32 


Save the ] 


lower 32 bi 


and the upper 32 


Save the ] 


lower 32 bi 


and the upper 32 


Save the ] 


lower 32 bi 


and the upper 32 


Save the ] 


lower 32 bi 


and the upper 32 


Save the ] 


lower 32 bi 


ON SUBROUTINE CALL OR INTERRUPT 


ts 
bi 
ts 
bi 
ts 


ts 
bi 
ts 
bi 
ts 


ts 
bi 
ts 


and the upper 32 bi 


PUSH ST H 

E THE EXTENDED PRECISION R 
PUSH RO ; 

PUSHF RO ; 

PUSH R1 ; 

PUSHF RI1 ; 

PUSH R2 ; 

PUSHF R2 ; 

PUSH R3 ; 

PUSHF R3 ; 

PUSH R4 ; 

PUSHF R4 ; 

PUSH R5 ; 

PUSHF R5 ; 

PUSH R6 7 

PUSHF R6 ; 

PUSH R7 ; 

PUSHF R7 ; 

FEF THE AUXILIARY REGISTERS 
PUSH ARO ; Save ARO 
PUSH AR1 : Save AR1 
PUSH AR2 ; Save AR2 
PUSH AR3 : Save AR3 
PUSH AR4 ; Save AR4 
PUSH AR5 ; Save AR5 
PUSH AR6 ; Save AR6 
PUSH AR7 ; Save AR7 


bit 


bit 


of 


of 


of 


of 


of 


of 


of 


of 


RO 


R1 


R2 


R3 


R4 


R5 


R6 


R7 
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Context Switching in Interrupts and Subroutines 


Example 2-4. Context Save for the TMS320C3x (Continued) 


SAVE THE REST R 
PUSH DP 
PUSH IRO 
PUSH IR1 
PUSH BK 
PUSH IE 
PUSH iF 
PUSH IOF 
PUSH RS 
PUSH RE 
PUSH RC 


SAVE IS COMPLET 


EGISTERS FROM THE REGISTER FILE 


GI 


Save 
Save 
Save 
Save 
Save 
Save 
Save 
Save 
Save 
Save 


data page pointer 

index register IRO 

index register IR1 
blocktsize register 
interrupt enable register 
interrupt flag register 
I/O flag register 

repeat start address 
repeat end address 

repeat counter 
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Example 2-5. Context Restore for the TMS320C3x 


TITLE 


CONTEXT 


RESTORE 


-global RESTR 


FOR THE 


TMS320C3x 


CONTEXT RESTORE AT THE END OF A SUBROUTINE CALL OR INTERRUPT 
* 
RESTR: 
* 
* RESTORE THE REST REGISTERS FROM THE REGISTER FILE 
* 
POP RC ; Restore repeat counter 
POP RE . Restore repeat end address 
POP RS ‘ Restore repeat start address 
POP IOF : Restore I/O flag register 
POP IF 7 Restore interrupt flag register 
POP IE 7 Restore interrupt enable register 
POP BK : Restore blocktsize register 
POP IR1 ; Restore index register IR1 
POP IRO : Restore index register IRO 
POP DP ; Restore data page pointer 
* 
* RESTORE THE AUXILIARY REGISTERS 
* 
POP: AR7 : Restore AR7 
POP AR6 : Restore AR6 
POP AR5 : Restore AR5 
POP AR4 : Restore AR4 
POP AR3 ; Restore AR3 
POP AR2 : Restore AR2 
POP AR1L : Restore AR1 
POP ARO : Restore ARO 
* 
* RESTORE THE EXTENDED PRECISION REGISTERS 


Program Control 
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Context Switching in Interrupts and Subroutines 


Example 2-5. Context Restore for the TMS320C3x (Continued) 


POPF R7 
POP R7 
POPF R6 
POP R6 
POPF R5 
POP R5 
POPF R4 
POP R4 
POPF R3 
POP R3 
POPF R2 
POP R2 
POPF R1 
POP R1 
POPF RO 
POP RO 
POP ST 
ESTORE IS COMPLETE 


Restore the upper 32 bits and 


the 
Restore 
the 
Restore 
the 
Restore 
the 
Restore 
the 
Restore 
the 
Restore 
the 
Restore 


Che 


lower 32 bi 


ts of R7 


the upper 32 bits and 


lower 32 bi 


ts of R6 


the upper 32 bits and 


lower 32 bi 


ts of RDS 


the upper 32 bits and 


lower 32 bi 


ts of R4 


the upper 32 bits and 


lower 32 bi 


ts of R3 


the upper 32 bits and 


lower 32 bi 


ts of R2 


the upper 32 bits and 


lower 32 bi 


ts of RL 


the upper 32 bits and 


lower 32 bi 


ts of RO 


Restore status register 
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2.5 Delayed Branches 


The ’C3x uses delayed branches to create single-cycle branching. The 
delayed branches operate like regular branches but do not flush the pipeline. 
Instead, the three instructions following a delayed branch are also executed. 
As discussed in the Program Flow Contro/ chapter of the TMS320C3x User’s 
Guide, the only limitations are that none of the three instructions following a 
delayed branch may be a: 


[1 Branch (standard or delayed) 
[J Call to a subroutine 

(1 Return from a subroutine 

(41 Return from an interrupt 

1 Repeat instruction 

Lj TRAP instruction 

_j IDLE instruction 


Conditional delayed branches use the conditions that exist at the end of the 
instruction immediately preceding the delayed branch. Sometimes a branch 
is necessary in the flow of a program, but fewer than three instructions can be 
placed after a delayed branch. For faster execution, it is still advantageous to 
use a delayed branch. This is shown in Example 2-6, with no operations per- 
formed (NOPs) taking the place of the unused instructions. The trade-off is 
more instruction words for less execution time. 


Example 2-6. Delayed Branch Execution 


x TITLE DELAYED BRANCH EXECUTION 


SKIP 


LDF 
BGED 
LDFN 
SUBF 
NOP 


MPYF 


LDF 


*+AR1(5),R2 


ti 
SKIP ; If loaded number >=0, branch (delayed) 
R2,R1 ; If loaded number <0, load it to R1 
3.0,R1 : Subtract 3 from R1 

; Dummy operation to complete delayed 

; branch 
1.5 ORL H Continue here if loaded number <0 
R1,R3 ; Continue here if loaded number >=0 


Load contents of memory to R2 
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2.6 Repeat Modes 


2.6.1 Block Repeat 


The ’C3x supports looping without any overhead. For that purpose, there are 
two instructions: RPTB, which repeats a block of code, and RPTS, which re- 
peats a single instruction. There are three control registers: repeat start-ad- 
dress (RS), repeat end-address (RE), and repeat counter (RC). These contain 
the parameters that specify loop execution. See the Program Flow Control 
chapter in the TMS320C3x User's Guide for a complete description of RPTB 
and RPTS. The code automatically sets RS and RF registers RPTB and RPTS 
when instructions are excluded; however, you must set the repeat counter reg- 
ister. 


Example 2—7 shows an application of the block repeat construct. In this exam- 
ple, an array of 64 elements is flipped over by exchanging the elements that 
are equidistant from the end of the array. In other words, the original array is: 


a(1), a(2),..., a(31), a(32),..., a(64) 
The final array after the rearrangement is as follows: 
a(64), a(63),..., a(32), a(31),..., a(1) 


Because the exchange operation is performed on two elements simultaneously, 
it requires 32 operations. The repeat counter register is initialized to 31. In gener- 
al, if RC contains the number N, the loop is executed N + 1 times. The loop is 
defined by the RPTB instruction and the EXCH label. 


Repeat Modes 


Example 2—7. Loop Using Block Repeat 


7% TITLE 


* THIS CODE 
SYMMETRIC AROUND THE MIDDLE 


LDI 
LDI 
ADDI 


LDI 
RPTB 
LDI 
LDI 


STI 
STI 


SEGMENT EXCHANGES THE VALUES OF ARRAY ELEMENTS THAT ARE 


LOOP USING BLOCK REPEAT 


OF THE ARRAY. 


@ADDR, ARO ; ARO points to the beginning of the array 
ARO, AR1 
63,AR1 ; AR1 points to the end of the 
; 64telement array 
31,RC 7 Initialize repeat counter 
EXCH ; Repeat RC+1 times between here and 
; EXCH 
*ARO,RO ; Load one memory element in RO, 
*AR1,R1 ; and the other in Rl 
R1,*ARO++(1) ; Then, exchange their locations 
RO, *AR1—— (1) 


The Program Flow Contro!/chapter in the TMS320C3x User's Guide discusses 
restrictions in the block-repeat construct. According to the contents of regis- 
ters RS, RE, and RC, the program counter is modified at the end of the loop. 
Therefore, no operation should attempt to modify the repeat counter or the pro- 
gram counter at the end of the loop. 


It is possible to nest repeat blocks; however, there is only one set of control 
registers: RS, RE, and RC. It is necessary to save these registers before entering 
an inside loop. You can implementa nested loop by using a register as a count- 
er and then using a delayed branch, rather than using the nested repeat block 
approach. 


Example 2-8 shows how to use the block repeat to find a maximum of 147 
numbers. 
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Example 2-8. Use of Block Repeat to Find a Maximum 


* 
* 
* TITLE USE OF BLOCK REPEAT TO FIND A MAXIMUM 
* 
ba THIS ROUTINE FINDS THE MAXIMUM OF N = 147 NUMBERS. 
* 
LDI 146,RC ; Initialize repeat counter to 14741 
LDI @ADDR, ARO A ARO points to beginning of array 
LD *ARO++(1),RO ; Initialize MAX to the first value 
* 
RPTB LOOP 
CMPF *ARO++(1),RO ; Compare number to the maximum 
LOOP LDFLT *tARO(1),RO ; If greater, this is a new maximum 


2.6.2 Single-Instruction Repeat 


The single-instruction repeat uses the control registers RS, RE, and RC in the 
same way as the block repeat. The advantage over the block repeat is that the 
instruction is fetched only once, and then the buses are available for moving 
operands. The single-instruction repeat construct is not interruptible; the block 
repeat is interruptible. 


Example 2-9 shows an application of the single-repeat construct. In this ex- 
ample, the sum of the products of two arrays is computed. The arrays are not 
necessarily different. If the arrays are a(i) and b(i), each of lengthN = 512, 
then register RO contains this quantity after computation: 


a (1) b (1) +a (2) b (2) +...4 a (N) b (N) 


The value of the RC is specified to be 511 in the instruction. If RC contains the 
number N, the loop is executed N + 1 times. 
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Example 2-9. Loop Using Single Repeat 


. TITLE LOOP USING SINGLE REPEAT 
* 
* THIS CODE SEGMENT COMPUTES SUM[a(i)b(i)] FOR i =1toN. 
* 
* 
LDI @ADDR1, ARO 7 ARO points to array a(i) 
LDI @ADDR2, AR1 7 AR1 points to array b(i) 
* 
LDF 0.0,RO ; Initialize RO 
* 
MPYF3 *ARO++(1),*AR1++(1),R1 
* ; Compute first product 
RPTS 517 : Repeat 512 times 
* 
MPYF3 *ARO++(1),*AR1++(1),R1 ; Compute next product 
| | ADDF3 R1,R0,RO ; and accumulate the 


‘ previous one 


ADDF R1,RO ; One final addition 
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2.7 Computed GOTOs 


Itis occasionally convenient to select the subroutine to be executed during run 
time (and not during assembly). The ’C3x’s computed GOTO instruction sup- 
ports this selection. The computed GOTO is implemented using the CALLcond 
instruction in the register-addressing mode. This instruction uses the contents 
of the register as the address of the call. Example 2-10 shows a computed 
GOTO for a task controller. 


Example 2-10. Computed GOTO 


i TITLE COMPUTED GOTO 
* 
7m TASK CONTROLLER 
* 
i THIS MAIN ROUTINE CONTROLS THE ORDER OF TASK EXECUTION (6 TASKS 
7% IN THE PRESENT EXAMPLE). TASKO THROUGH TASK5 ARE THE NAMES OF 
* SUBROUTINES TO BE CALLED. THEY ARE EXECUTED IN ORDER, TASKO, 
* TASK1, . . .TASK5. WHEN AN INTERRUPT OCCURS, THE INTERRUPT 
* SERVICE ROUTINE IS EXECUTED, AND THE PROCESSOR CONTINUES 
i WITH THE INSTRUCTION FOLLOWING THE IDLE INSTRUCTION. THIS 
* ROUTINE SELECTS THE TASK APPROPRIATE FOR THE CURRENT CYCLE, 
* CALLS THE TASK AS A SUBROUTINE, AND BRANCHES BACK TO HE IDLE 
me TO WAIT FOR THE NEXT SAMPLE INTERRUPT WHEN THE SCHEDULED TASK 
* HAS COMPLETED EXECUTION. RO HOLDS THE OFFSET FROM THE BASE 
me ADDRESS OF THE TASK TO BE EXECUTED. 
* 
* 

LDI 5,RO0 : Initialize RO 

LDI @ADDR, AR1L ; AR1 holds base address of the table 
WAIT IDLE : Wait for the next interrupt 

ADDI3 *AR1,R0O,AR2 ; Add the base address to the table 
* H Entry number 

SUBI Ly RO : Decrement RO 

LDILT 5,RO iH If RO<O, reinitialize it to 5 

LDI *AR2,R1 : Load the task address 

CALLU R1 : Execute appropriate task 

BR WAIT 
* 
TSKSEQ .word TASKS ; Address of TASKS 

word TASK4 ; Address of TASK4 

-word TASK3 . Address of TASK3 

.word TASK2 ; Address of TASK2 

-word TASK1 ; Address of TASK1 

.word TASKO : Address of TASKO 
ADDR .-word TSKSEQ 
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Logical and Arithmetic Operations 


This chapter describes the ’C3x instruction set, which supports both integer and 
floating-point arithmetic and logical operations. These instructions can be com- 
bined to form more complex operations. 
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3.1. Bit Manipulation 


Instructions for logical operations, such as AND, OR, NOT, ANDN, and XOR, 
can be used with the shift instructions for bit manipulation. A special instruction 
called TSTB tests bits. TSTB performs the same operation as AND, but the 
result of the logical AND is only used to set the condition flags and is not written 
anywhere. Example 3—1 and Example 3-2 demonstrate the use of these in- 
structions for bit manipulation and testing. 


Example 3-1. Use of TSTB for Software-Controlled Interrupt 


TITLE USE OF TSTB FOR SOFTWARELCONTROLLED INTERRUPT 


N THIS EXAMPLE, ALL INTERRUPTS HAVE BEEN DISABLED BY 
ESETTING THE GIE BIT OF THE STATUS REGISTER. WHEN AN 
NTERRUPT ARRIVES, IT IS STORED IN THE IF REGISTER. THE 
RESENT EXAMPLE ACTIVATES THE INTERRUPT SERVICE ROUTINE INTR 
HEN IT DETECTS THAT INT2+ HAS OCCURRED. 


+ + + FF F 


TSTB 0100b,IF ; Check if bit 2 of IF is set, 
CALLNZ INTR . and, if so, call subroutine INTR 


Bit Manipulation 


Example 3-2. Copy a Bit From One Location to Another 


* TITLE COPY A BIT FROM ONE LOCATION TO ANOTHER 

* 

7 BIT I OF R1 NEEDS TO BE COPIED TO BIT J OF R2. 

* ARO POINTS TO A LOCATION HOLDING I, AND IT IS ASSUMED THAT THE 

% NEXT MEMORY LOCATION HOLDS THE VALUE J. 

* 

* I 

* L 

* 

* R1 

* 

* J 

‘ L 

* 

. R2 

* 

* 

* 

* I * ARO 

* 

* 

* 

+ J * (ARO+1) 

* 

* 
LDI 1,R0 
LSH *ARO,RO : Shift 1 to align it with bit I 
(STB R1,RO $ Test the Ith bit of Rl 
BZD CONT : If bit = 0, branch delayed 
LDI 1,R0 
LSH *+ARO0(1),RO : Align 1 with Jth location 
ANDN RO, R2 ; If bit = 0, reset Jth bit of R2 
OR RO,R2 * If bit = 1, set Jth bit of R2 

CONT 
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3.2 Block Moves 


Since the ’C8x addresses a large amount of memory, blocks of data or pro- 
gram code can be stored off-chip in slow memories and then loaded on-chip 
for faster execution. Data can also be moved from on-chip to off-chip memory 
for storage or for multiprocessor data transfers. 


You can use direct memory access (DMA) in parallel with CPU operations to 
accomplish such data transfers. The DMA operation is explained in detail in 
Programming the DMA Coprocessor chapter later in the book. An alternative 
to DMA is to perform data transfers under program control using load and store 
instructions in a repeat mode. Example 3-3 shows the transfer of a block of 
512 floating-point numbers from external memory to block 1 of the on-chip 
RAM. 


Example 3-3. Block Move Under Program Control 


* 


extern .word 
blockl .word 


LDI 
LDI 
LDF 


RPTS 
LDF 
| | STF 


* TITLE BLOCK MOVE UNDER PROGRAM CONTROL 
01000H 
0809CO00H 
@extern, ARO ; Source address 
@block1,AR1 ; Destination address 
*ARO++, RO ; Load the first number 
510 ; Repeat following instruction 511 times 
*ARO++,RO ; Load the next number, and... 
RO, *AR1++ A store the previous one 
RO, *AR1 : Store the last number 


STF 


Bit-Reversed Addressing 


3.3 Bit-Reversed Addressing 


The ’'C3x can implement fast Fourier transforms (FFTs) with bit-reversed 
addressing. If the data to be transformed is in the correct order, the final result 
of the FFT is presented in bit-reversed order. To recover the frequency-domain 
data in the correct order, you must swap certain memory locations. The 
bit-reversed addressing mode makes swapping unnecessary. The next time 
data needs to be accessed, the access is performed in a bit-reversed manner 
rather than sequentially. The base address of bit-reversed addressing must be 
located on a boundary the size of the table. For example, if IRO = 2-1, the n 
least significant bits (LSBs) of the base address must be 0. 


In bit-reversed addressing, IRO holds a value equal to one half the size of the 
FFT if real and imaginary data are stored in separate arrays. During accessing, 
the auxiliary register is indexed by IRO, but with reverse carry propagation. 
Example 3-4 illustrates a 512-point complex FFT being moved from the place 
of computation (pointed at by ARO) to a location pointed at by AR1. In this ex- 
ample, real and imaginary parts, XR(i) and XI(i), of the data are not stored in 
separate arrays. They are interleaved as XR(0), XI(O), XR(1), XI(1), ..., 
XR(N-1), XI(N-1). Because of this arrangement, the length of the array is 2N 
instead of N, and IRO is set to 512 instead of 256. 


Example 3-4. Bit-Reversed Addressing 


* 
* TITLE BITHREVERSED ADDRESSING 
* 
x THIS EXAMPLE MOVES THE RESULT OF THE 512+POINT FFT 
* COMPUTATION POINTED AT BY ARO TO A LOCATION POINTED AT 
i BY AR1. REAL AND IMAGINARY POINTS ARE ALTERNATING. 
LDI 512, IR0O 
LDI 2,IR1 
LDI 511,,RC 1 Repeat 511+1 times 
LDF *+ARO(1),R1 ; Load first imaginary point 
RPTB LOOP 
* 
LDF *ARO++ (IRO)B, RO ; Load real value (and point 
| | STF R1, *+AR1 (1) : to next location) and store 
s 7 the imaginary value 
LOOP LDF *+ARO(1),R1 ; Load next imaginary point and store 
STF RO, *AR1++ (IR1) ; previous real value 
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3.4 


3.4.1 
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Integer and Floating-Point Division 


Although division is not implemented as a single instruction in the ’C3x, the 
instruction set can perform an efficient division routine. Integer and floating- 
point division are examined separately because a different algorithm is used for 
each. 


Integer Division 


Division is implemented on the ’C3x by repeated subtractions using SUBC, a 
special conditional subtract instruction. Consider the case of a 32-bit positive 
dividend with i significant bits (and 32 —i sign bits), as well as a 32-bit positive 
divisor with j significant bits (and 32 —j sign bits). The repetition of the SUBC 
command i — j + 1 times produces a 32-bit result in which the lower 
i — j +1 bits are the quotient and the upper 31 — i+j bits are the remainder 
of the division. 


SUBC implements binary division in the same manner as long division. The 
divisor, which is assumed to be smaller than the dividend, is shifted left i — j 
times to align it with the dividend. Using SUBC, the shifted divisor is subtracted 
from the dividend. For each subtraction that does not produce a negative an- 
swer, the dividend is replaced by the difference. Itis then shifted to the left, and 
a1 is putin the LSB. If the difference is negative, the dividend is simply shifted 
left by 1, leaving a zero in the LSB. This operation is repeated i —j + 1 times. 


Integer and Floating-Point Division 


As an example, consider the division of 33 by 5, using both long division and 
the SUBC method (see Figure 3-1). In this case, i = 6 and j = 3, so that the 
SUBC operation is repeated 6 — 3 + 1 = 4 times. 


Figure 3-1. Long Division and SUBC Method 


Long division 


| 00000000000000000000000000000110 


00000000000000000000000000000101 | 00000000000000000000000000100001 


SUBC method: 


00000000000000000000000000100001 
00000000000000000000000000101000 


Negative ‘ia! 


00000000000000000000000000100010 
00000000000000000000000000101000 


iter en rcenae | rere 010 


00000000000000000000000000110101 
00000000000000000000000000101000 


a al 01 


00000000000000000000000000011011 
00000000000000000000000000101000 


Negative difference 


| 00000000000000000000000000110110 


| ay | 


Remainder Quotient 


Quotient 


—101 
1101 
-101 
11 Remainder 


Dividend 
Divisor (aligned) 
(First SUBC command) 


New dividend + quotient 
Divisor 
Difference (> 0) (second SUBC command) 


New dividend + quotient 
Divisor 
Difference (> 0) (third SUBC command) 


New dividend + quotient 
Divisor 
(Fourth SUBC command) 


Final result 


When the SUBC command is used, both the dividend and the divisor must be 
positive. Example 3-5 shows an example of integer division in which the sign 
of the quotient is properly handled. The last instruction before returning modi- 
fies the condition flag, in case subsequent operations depend on the sign of 


the result. 
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Example 3-5. Integer Division 


SIGNED INTEGER DIVIDEND IN RO, 


ER DIVISOR IN R1 


, IRO, IRL 


1IZE DIVISOR WITH DIVIDEND 


SUBC 


ENT IS IN LSBs OF RESULT 


31462 (DEPENDS ON AMOUNT OF NORMALIZATION) 


* 
* TITLE INTEGER DIVISION 
* 

SUBROUTINE DIVI 
* 
* 
* INPUTS: 
* SIGNED INTEGI 
* 
* OUTPUT:  RO/R1 into RO 
* 
* REGISTERS USED: ROtR3 
* 
* OPERATION: 1. NORMA 
* 2. REPEA 
* 3. QUOTI! 
* 
* CYCLES: 
* 

-globl DIVI 

SIGN -set R2 
TEMPF .set R3 
TEMP .set IRO 
COUNT .set IR1 
*  DIvI + SIGNED DIVISION 
DIVI: 


be DETERMINE SIGN OF RESULT. 


xX 


OR 


ABSI 
ABSI 


CMP I 


B 


+ 


FLOAT 
PUSHF 


Pe 
L 


GTD 


OP 
SH 


R 
R 
R 


R 


0,R1,SIGN ; 
0 
1 


0,R1 i 


ZERO 7 


NORMALIZE OPERANDS. 
FOR DIVISOR AND AS REPEAT 


R 


TI 


GET ABSOLUTE VALUE OF OPERANDS. 


Get the sign 


Divisor > dividend ? 
If so, return 0 


USE DIFFERENCE IN EXPONENTS AS SHIFT COUNT 


0, TEMPE ; 
EMPF ; 


COUNT ; 


ze, 


24, COUNT ; 


COUNT FOR ’SUBC’. 


Normalize dividend 
PUSH as float 

POP as int 

Get dividend exponent 
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Example 3-5. Integer Division (Continued) 


FLOAT R1, TEMPE ; Normalize divisor 

PUSHE EMPF ; PUSH as float 

POP EMP ; POP as int 

LSH +24, TEMP ; Get divisor exponent 

SUBI EMP , COUNT : Get difference in exponents 
LSH COUNT, R1 ; Align divisor with dividend 


* DO COUNT+1 SUBTRACT & SHIFTS. 


RPTS COUNT 
SUBC R1,RO 


* MASK OFF THE LOWER COUNT+1 BITS OF RO. 
* 
SUBRI 31,COUNT ; Shift count is (32 + (COUNT+1) ) 
LSH COUNT, RO ; Shift left 
NEGI COUNT 
LSH COUNT, RO : Shift right to get result 


im CHECK SIGN AND NEGATE RESULT IF NECESSARY. 


NEGI RO,R1 : Negate result 

ASH +31,SIGN ; Check sign 

LDINZ R1,RO : If set, use negative result 
CMPI 0,RO : Set status from result 
RETS 


* RETURN 0. 


LDI 0,RO 


If the dividend is less than the divisor and you want fractional division, you can 
perform a division after you determine the desired accuracy of the quotient in 
bits. If the desired accuracy is k bits, shift the dividend left by k positions. Then 
apply the algorithm described above, with i replaced by i +k. It is assumed that 
i+ kis less than 32. 
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3.4.2 Floating-Point Inverse and Division 


This section explains how to implement floating-point division on the ’C3x. Since 
the algorithm outlined here computes the inverse of a number v, to perform y / v, 
multiply y by the inverse of v. 


The computation of 1 /v is based on the following iterative algorithm. At the 
ith iteration, the estimate x [i] of 1 / vis computed from v and the previous esti- 
mate x [i-1] according to the following formula: 


x [i] = x [i-1] x (2.0-v x x[i-1]) 


To start the operation, an initial estimate x [0] is needed. If v=a x 2°, a good 
initial estimate is: 


x [0] = 1.0 x 2-€-1 


Example 3-6 shows the implementation of this algorithm on the ’C3x, where 
the iteration has been applied five times. Both accuracy and speed are af- 
fected by the number of iterations. The accuracy offered by the single-preci- 
sion floating-point format is 2 —23 = 1.192E —7. If you want more accuracy, use 
more iterations. If you want less accuracy, reduce the number of iterations to 
decrease the execution time. 


This algorithm properly treats the boundary conditions when the input number 
either is 0 or has a very large value. When the input is 0, the exponent 
e =-—128. Then the calculation of x[0] yields an exponent that is equal to 
— (-128) —1 = 127, and the algorithm overflows and saturates. On the other 
hand, in the case of a very large number with e = 127, the exponent of x[0] is 
—127 — 1 = -128. This causes the algorithm to yield 0, which is reasonable for 
handling that boundary condition. 


Integer and Floating-Point Division 


Example 3-6. Inverse of a Floating-Point Number 


* 
* TITLE INVERSE OF A FLOATINGEPOINT NUMBER 
* 
* 
* SUBROUTINE INVF 
* 
* 
i THE FLOATING-POINT NUMBER v IS STORED IN RO. AFTER THE 
am COMPUTATION IS COMPLETED, 1/v IS ALSO STORED IN RO. 
* 
* TYPICAL CALLING SEQUENCE: 
* LDF v,RO 
* CALL  INVF 
* 
* ARGUMEN ASSIGNMENTS: 
* ARGUMENT | FUNCTION 
* + 
is RO | v = NUMBER TO FIND THE RECIPROCAL OF (UPON THE CALL) 
* RO | 1/v (UPON THE RETURN) 
* 
* REGISTER USED AS INPUT: RO 
* REGISTERS MODIFIED: RO, Rl, R2, R3 
* REGISTER CONTAINING RESULT: RO 
* 
* CYCLES: 35 WORDS: 32 
* 
* 
-global INVF 
* 
INVF: LDF RO, R3 . v is saved for later 


ABSF RO ; The algorithm uses v = |v| 


* EXTRACT THE EXPONENT OF v. 


PUSHF RO 
POP R1 
ASH +24,R1 ; The 8 LSBs of R1 contain the exponent 
x ; of v 
* 
* x[0] FORMATION IS GIVEN THE EXPONENT OF v. 
* 
NEGI R1 
SUBI 1,R1 ; Now we have tetl, the exponent of x[0] 
ASH 24,R1 
PUSH RL 
POPF RI ; Now R1 = x[0] = 1.0 * 2** (tetl) 
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Example 3-6. Inverse of a Floating-Point Number (Continued) 


* 


NOW THE ITERATIONS BEGIN. 


MPYF R1,RO,R2 ; R2=v * x[0] 
SUBRF 2.0,R2 : R2 = 2.0 tv * x[0] 
MPYF R2,R1 ; Rl = x[1] = x[0] * (2.0 £v * x[0]) 


MPYF R1,RO,R2 j; R2 =v * x[1] 


SUBRF 2.0,R2 : R2 = 2.0 -v * x[1] 
MPYF R2,R1 - Rl = x[2] = x[1] * (2.0 Ev * x[1]) 
* 
MPYF R1,RO,R2 ; R2 =v * x[2] 
SUBRF 2.0,R2 : R2 = 2.0 tv * x[2] 
MPYF R2,R1 ; Rl = x[3] = x[2] * (2.0 Ev * x[2]) 
* 
MPYF R1,RO,R2 ; R2 =v * x[3] 
SUBRF 2.0,R2 : R2 = 2.0 tv * x[3] 
MPYF R2,R1 7 Rl = x[4] = x[3] * (2.0 Ev * x[3]) 
* 
RND R1 ; This minimizes error in the LSBs 
* 
- FOR THE LAST ITERATION WE USE THE FORMULATION: 
* x[5] = (x[4] * (1.0 = (wv * x[4]))) + x[4] 
* 
MPYF R1,RO,R2 ; R2 = v * x[4] = 1.0..01.. => 1 
SUBRF 1.0,R2  R2°= 1,0 bv * x4] — 0.0..01... => 0 
MPYF R1,R2 ; R2 = x[4] * (1.0 tv * x[4]) 
ADDF R2,R1 ; R2 = x[5] = (x[4]*(1.0t(v*x[4])))+x[4] 
* 
RND R1,RO ; Round since this is followed by a MPYF 


* NOW THE CASE OF v < 0 IS HANDLED. 


NEGF RO,R2 


LDF R3,R3 : This sets condition flags 
LDFN R2,R0 ; If v < 0, then RO = +RO 
* 
RETS 
* 
* END 
end 
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3.5 Square Root Computation 


An iterative algorithm is used to compute a square root on the ’C3x and is simi- 
lar to the one used for computation of the inverse. This algorithm computes the 
inverse of the square root of anumber v, 1 /SQRT(v). To derive SQRT(v), mul- 
tiply this result by v. Since in many applications division by the square root of 
a number is desirable, the output of the algorithm saves the effort to compute 
the inverse of the square root. 


At the ith iteration, the estimate x[i] of 1 / SQRT(v) is computed from v and the 
previous estimate x[i-1] according to this formula: 


x [i] =x [i-1] x (1.5-(v/2) x x [i-1] x x [i-1]) 


To start the operation, an initial estimate x[0] is needed. If v = a x 2°, agood 
initial estimate is: 


x[0] = 10x 2-2 


Example 3-7 shows the implementation of this algorithm on the ’C3x, where 
the iteration is applied five times. Both accuracy and speed are affected by the 
number of iterations. If you want more accuracy and less speed, increase the 
number of iterations. If you want less accuracy and more speed, reduce the 
number of iterations. 
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Example 3—7. Square Root of a Floating-Point Number 


* 

* TITLE SQUARE ROOT OF A FLOATINGtPOINT NUMBER 

* 

* 

* SUBROUTINE SORT 

* 

* THE FLOATING POINT NUMBER v IS STORED IN RO. AFTER THE 
* COMPUTATION IS COMPLETED, SQRT(v) IS ALSO STORED IN RO. NOTE 
* THAT THE ALGORITHM ACTUALLY COMPUTES 1/SQRT(v) . 

* 

* 

* TYPICAL CALLING SEQUENCE: 

* 

* LDF v, RO 

* CALL SQRT 

* ARGUMENT ASSIGNMENTS: 

* ARGUMENT | FUNCTION 

CSS See 4+----—-—-—-—-—-—-—-—-—-—-—-—-—-———-— —— —— — — — — — — — -— - 

* RO | v = NUMBER TO FIND THE SQUARE ROOT OE 
* | (UPON THE CALL) 

* RO | SORT(v) (UPON THE RETURN) 

* 

* REGISTER USED AS INPUT: RO 

* REGISTERS MODIFIED: RO, R1, R2, R3 

* REGISTER CONTAINING RESULT: RO 

* 

* CYCLES: 50 WORDS: 39 

* 


-global SQRT 


x EXTRACT THE EXPONENT OF v. 


SORT: LDF RO, R3 ; Save v 
RETSLE ; Return if number is nontpositive 
PUSHF RO 
POP R1 
ASH +24,R1 ; The 8 LSBs of R1 contain exponent of v 
ADDI 1,R1 : Add a rounding bit in the exponent 
ASH -1,R1 ; e/2 


*  X[0] FORMATION GIVEN THE EXPONENT OF v. 


NEGI R1 

ASH 24,R1 

PUSH RI 

POPF RI ; Now RL = x[0] = 1.0 * 2** (te/2) 
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Example 3—7. Square Root of a Floating-Point Number (Continued) 


* GENERATE v/2. 


NOW 


* end 


MPYF 0.5,R0 
THE ITERATIONS BI 
MPYF R1i,R1,R2 
MPYF RO, R2 
SUBRF 1.5,R2 
MPYF R2,R1 
RND R1 

MPYF R1,R1,R2 
MPYF RO, R2 
SUBRF 1.5,R2 
MPYF R2,R1 
RND R1 

MPYF R1i,R1,R2 
MPYF RO, R2 
SUBRF 1.5,R2 
MPYF R2,R1 
RND R1 

MPYF R1,R1,R2 
MPYF RO, R2 
SUBRF 1.5,R2 
MPYF R2,R1 
RND R1 

MPYF R1,R1,R2 
MPYF RO, R2 
SUBRF 1.5,R2 
MPYF R2,R1 
RND R1, RO 
MPYF R3, RO 
RETS 

.end 


a 


EGIN. 


; R2 
: R2 
; R2 


; R1 


R2 
R2 
R2 
R1 


Ne Ne Ne Ne Ne 


R2 


R1 


Ne Ne Ne Ne Ne 


R2 
R2 
R2 
R1 


Ne Ne Ne Ne Ne 


R2 
R2 
R2 
R1 


Ne Ne Ne Ne Ne 


R2 = 


R2 = 


v/2 and take rounding bit out 


x[0] * x[0] 
(v/2) * x[0] * x[0] 
1.5 + (v/2) * x[0] * x[0] 


x[1] = x[0] * 
1.5 & (v/2)*x[0]*x[0]) 


x[1] * x[1] 

(v/2) * x[1] * x[1] 

1.5 + (v/2) * x[1] * x[1] 
x[2] = x[l1] * 
1.5 + (v/2)*x[1]*x[1]) 
x2) -* x12] 

(v/2) * x[2] * x[2] 

1.5 + (v/2) * x[2] * x[2] 
x[3] = x[2] 

(1.5 & (v/2)*x[2]*x[2]) 
x[3] * x[3] 


(v/2) * x[3] * x[3] 

1.5 + (v/2) * x[3] * x[3] 
x[4] = x[3] 

(1.5 & (v/2) * x[3] * *13]) 


= x[4] * x[4] 


(v/2) * x[4] * x[4] 

1.5 + (v/2) * x[4] * x[4] 
x[5] = x[4] 

(1.5 + (v/2) * x[4] * x[4]) 


. Round 


; Sqrt(v) from sqrt (v** (#1) ) 
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3.6 Extended-Precision Arithmetic 


The ’C8x offers 32 bits of precision for integer arithmetic and 24 bits of preci- 
sion in the mantissa for floating-point arithmetic. For higher precision in float- 
ing-point operations, the eight extended-precision registers R7 to RO contain 
eight additional bits of accuracy. Since no comparable extension is available 
for fixed-point arithmetic, this section shows how you can achieve fixed-point 
double precision by using the processor. The technique consists of performing 
the arithmetic by parts (which is similar to performing longhand arithmetic). 


In the instruction set, operations ADDC (add with carry) and SUBB (subtract 
with borrow) use the status carry bit for extended-precision arithmetic. The 
carry bit is affected by the arithmetic operations of the arithmetic logic unit 
(ALU) and by the rotate and shift instructions. It can also be manipulated direct- 
ly by setting the status register to certain values. For proper operation, the 
overflow mode bit should be reset (OVM = 0) so that the accumulator results 
are not loaded with the saturation values. Example 3-8 and Example 3-9 
show 64-bit addition and 64-bit subtraction. The first operand is stored in regis- 
ters RO (low word) and Ri (high word). The second operand is stored in R2 
and R8. The result is stored in RO and R1. 


Example 3-8. 64-Bit Addition 


* TITLE 64+BIT ADDITION 
* 
* TWO 64+BIT NUMBERS ARE ADDED TO EACH OTHER, PRODUCING 
* 2 64+BIT RESULT. THE NUMBERS X (R1,RO) AND Y (R3,R2) ARE 
* ADDED, RESULTING IN W (R1,RO). 
* 
* Rl RO 
* + R3 R2 
* ————— as, 
* Rl RO 
* 
ADDI R2,RO0 
ADDC R3,R1 


Extended-Precision Arithmetic 


Example 3-9. 64-Bit Subtraction 


* ITLE 64tBIT SUBTRACTION 
* 
* TWO 64+BIT NUMBERS ARE SUBTRACTED FROM EACH OTHER 
* PRODUCING A 64tBIT RESULT. THE NUMBERS X (R1,RO) AND 
* Y (R3,R2) ARE SUBTRACTED, RESULTING IN W (R1,RO). 
* 
* Rl RO 
* - R3 R2 
* 
* Rl RO 
* 
SUBI R2,RO 
SUBB R3,R1 


When two 32-bit numbers are multiplied, a 64-bit product results. The proce- 
dure for multiplication is to split the 32-bit magnitude values of the multiplicand 
X and the multiplier Y into two parts (X1, XO) and (X3, X2), respectively, with 
16 bits each. The operation is done on unsigned numbers, and the product is 
adjusted for the sign bit. Example 3-10 shows the implementation of a 32-bit 
by 32-bit multiplication. 


Logical and Arithmetic Operations 3-17 


Extended-Precision Arithmetic 


Example 3-10. 32-Bit-by-32-Bit Multiplication 


* 
is TITLE 32 BIT X 32 BIT MULTIPLICATION 
* 
* 
* SUBROUTINE EXTMPY 
* 
ss FUNCTION: TWO 32+BIT NUMBERS ARE MULTIPLIED, PRODUCING A 64+BIT 
* RESULT. THE TWO NUMBERS (X and Y) ARE EACH SEPARATED INTO TWO 
ie PARTS (X1 X0) AND (Y1 YO), WHERE X0, X1, YO, AND Y1 ARE 16 BITS. 
x THE TOP BIT IN X1 AND Yl IS THE SIGN BIT. THE PRODUCT IS 
* IN TWO WORDS (WO AND W1). THE MULTIPLICATION IS PERFORMED ON 
” POSITIVE NUMBERS, AND THE SIGN IS DETERMINED AT THE END. 
* 
* 
* X1 XO BITS OF PRODUCTS 
* xX Y1 YO (NOT COUNTING SIGN) PRODUCT 
¥ SS, 
x X0*YO 16+16 Pl 
Be XO*Y1 16+16 P2 
X1*YO 16+16 P3 
X1*Y1 16+16 P4 
* 
* W1 wo 


* ARGUMENT ASSIGNMENTS: 

* ARGUMENT | FUNCTION 

* + = = = = = 

* RO | MULTIPLIER AND LOW WORD OF THE PRODUCT 
* Rl | MULTIPLICAND AND UPPER WORD OF THE PRODUCT 
* 

* 

* REGISTERS USED AS INPUT: RO, R1 

* REGISTERS MODIFIED: RO, R1, R2, R3, R4, ARO, ARI 

* REGISTER CONTAINING RESULT: RO,R1 

* 

* 
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Example 3-10. 32-Bit-by-32-Bit Multiplication (Continued) 


* CYCLES: 28 (WORST CASE) WORDS: 25 
* 
.global EXTMPY 
* 
EXTMPY XOR3 RO,R1,ARO ; Store sign 
ABSI RO ; Absolute values of X 
ABSI R1 ; and Y 
* 
* SEPARATE MULTIPLIER AND MULTIPLICAND INTO TWO PARTS 
* 
LDI +16,AR1 
LSH3 AR1,RO,R2 : R2 = Xl = upper 16 bits of X 
AND OFFFFH, RO ; RO = XO = lower 16 bits of X 
LSH3 AR1,R1,R3 ? R3 = Yl = upper 16 bits of Y 
AND OFFFFH,R1 ; Rl = YO = lower 16 bits of Y 
* 
x CARRY OUT THE MULTIPLICATION 
* 
MPYI3 RO,R1,R4 ; XO*YO = Pl 
MPYT R3, RO ; XO*Y1 = P2 
MPYT R2,R1 ; X1*YO = PB3 
ADDI RO, R1 ; P2+P3 
MPYI R2,R3 ; X1*Y1l = P4 
* 
LDI R1,R2 
LSH 16,R2 7 Lower 16 bits of P2+P3 
CMP I 0, ARO ; Check the sign of the product 
BGED DONE A If >0, multiplication complete 
7 (delayed) 
LSH -16,R1 ; Upper 16 bits of P2+P3 
ADDI3 R4,R2,R0 ; WO = RO = lower word of the product 
ADDC3 R1,R3,R1 . Wl = R1 = upper word of the product 


* NEGATE THE PRODUCT IF THE NUMBERS ARE OF OPPOSITE SIGNS 


NOT RO 
ADDI 1,R0 
NOT R1 
ADDC 0,R1 


DONE RETS 
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IEEE/TMS320C3x Floating-Point Format Conversion 


The fast version of the IEEE-to-’C3x conversion routine was originally devel- 
oped by Apollo Computer, Inc. Other routines are based on this algorithm. 


In fixed-point arithmetic, the binary point that separates the integer from the 
fractional part of the number is fixed at a certain location. For example, if a 
32-bit number has the binary point after the most significant bit (MSB), which 
is also the sign bit, only fractional numbers (numbers with absolute values less 
than 1) can be represented. A number having 31 fractional bits is called a Q31 
number. All operations assume that the binary point is fixed at this location. 
The fixed-point system, although simple to implement in hardware, imposes 
limitations in the dynamic range of the represented number. This causes scal- 
ing problems in many applications. You can avoid this difficulty by using float- 
ing-point numbers. 


In a floating-point system, each integer or fraction is represented by three 
fixed-point numbers that constitute a floating-point number. Therefore, a float- 
ing-point number consists of a mantissa, m, multiplied by base b raised to an 
exponent e: 


m x be 


To provide the greatest resolution, the mantissa Is typically a normalized num- 
ber with an absolute value between 1 and 2. Although the mantissa is repre- 
sented as a fixed-point number, the position of the actual value is determined 
by the exponent e. 


To achieve greater efficiency in hardware implementation, the ‘C3x uses a 
floating-point format that differs from the IEEE standard. This section briefly 
describes the two formats and presents software routines that show how to 
make conversions between the two formats. 


’C8x floating-point format: 
8 1 23 


e s f 


IEEE/TMS320C3x Floating-Point Format Conversion 


In a 32-bit word representing a floating-point number in the ’C3x, the first eight 
bits correspond to the exponent, expressed in twos-complement format. 
There is one bit for sign and 23 bits for the mantissa. The mantissa is ex- 
pressed in twos-complement form, with the binary point after the most signifi- 
cant nonsign bit. Since this bit is the complement of the sign bit s, it is sup- 
pressed; the mantissa actually has 24 bits. A special case occurs when 
e = —128. In this case, the number is interpreted as 0, independently of the 
values of s and f (which are set to 0 by default). The values of the represented 
numbers in the ’C3x floating-point format are as follows: 


2€ x (01.f) ifs= 0 
2e x (10.f) ifs= 1 
0 ife = 128 


IEEE floating-point format: 
1 8 23 


s e f 


The IEEE floating-point format uses sign-magnitude notation for the mantissa, 
and the exponent is biased by 127. In a 32-bit word representing a floating- 
point number, the first bit is the sign bit. The next eight bits correspond to the 
exponent, which is expressed in an offset-by-127 format (the actual exponent 
is e-127). The following 23 bits represent the absolute value of the mantissa 
with the most significant 1 implied. The binary point is after this most significant 
1. The mantissa actually has 24 bits. Several special cases are summarized 
below. 


These are the values of the numbers represented in the IEEE floating-point 
format: 


(-1)§ x 2¢-127 * (01.4) if0<e < 255 


Special cases: 


(-1)$ < 0.0 ife = 0 and f= 0 (zero) 

(-1)8 x 2-126 * (0,f) if e = 0 andf < > 0 (denormalized) 
(-1)§ x infinity if e = 255 and f = 0 (infinity) 

NaN (not a number) ife = 255 andf<>0 


Based on these definitions of the formats, two versions of the conversion rou- 
tines were developed. One version handles the complete definition of the for- 
mats. The other ignores some of the special cases (typically the ones that are 
rarely used), but has the benefit of executing faster than the complete conver- 
sion. For this discussion, the two versions are referred to as the complete ver- 
sion and the fast version, respectively. 
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3.7.1. IEEE-to-TMS320C3x Floating-Point Format Conversion 


Example 3-11 shows the fast conversion from IEEE to ’C3x floating-point for- 
mat. It properly handles the general case when 0 < e < 255 and also handles 
Os (that is, e = 0 andf = 0). The other special cases (denormalized, infinity, 
and NaN) are not treated and, if present, give erroneous results. 


Example 3-11. IEEE-to-TMS320C3x Conversion (Fast Version) 


* TITLE IEEE TO TMS320C3x CONVERSION (FAST VERSION) 
* 
* 
* SUBROUTINE FMIEEE 
* 
* FUNCTION: CONVERSION BETWEEN THE IEEE FORMAT AND THE 
* TMS320C3x FLOATING-POINT FORMAT. THE NUMBER TO 
* BE CONVERTED IS IN THE LOWER 32 BITS OF RO. 
‘ HE RESULT IS STORED IN THE UPPER 32 BITS OF RO. 
* UPON ENTERING THE ROUTINE, AR1 POINTS TO THE 
* FOLLOWING TABLE: 
* 
* (0) OxFF800000 <—— AR1 
* (1) OxFF000000 
* (2) 0x7F000000 
* — (3) 0x80000000 
* (4) 0x81000000 
* 
* ARGUMENT ASSIGNMENTS: 
* ARGUMENT | FUNCTION 
= + 
* RO NUMBER TO BE CONVERTED 
* ARI POINTER TO TABLE WITH CONSTANTS 
* 
* REGISTERS USED AS INPUT: RO, AR1 
* REGISTERS MODIFIED: RO, Rl 
* REGISTER CONTAINING RESULT: RO 
* 
* NOTE: SINCE THE STACK POINTER SP IS USED, MAKE SURE TO 
* INITIALIZE IT IN THE CALLING PROGRAM. 
* 
* 
* CYCLES: 12 (WORST CASE) WORDS: 12 
* 
-global FMIEEE 
* 
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Example 3-11. IEEE-to-TMS320C3x Conversion (Fast Version) (Continued) 


FMIEEE AND3 RO, *AR1,R1 ; Replace fraction with 0 
BND NEG ; Test sign 
ADDI RO,R1 ; Shift sign 
H and exponent inserting 0 
LDIZ *+AR1(1),R1 H If all 0, generate C30 0 
SUBI *+AR1(2),R1 7 Unbias exponent 
PUSH R1 
POPF RO : Load this as a flt. pt. number 
RETS 
* 
NEG PUSH R1 
POPF RO ; Load this as a flt. pt. number 
NEGF RO, RO ; Negate if orig. sign is negative 
RETS 


Example 3-12 shows the complete conversion between the IEEE and ’C3x 
formats. In addition to the general case and the Os, it handles the special cases 
as follows: 


L) If NaN (e = 255, f< >0), the number is returned intact. 


_) Ifinfinity (e = 255, f = 0), the output is saturated to the most positive or 
negative number, respectively. 


Lj If denormalized (e = 0, f< >0), two cases are considered. If the MSB of 
fis 1, the number is converted to ’C3x format. Otherwise, an underflow oc- 
curs, and the number is set to 0. 
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Example 3-12. IEEE-to-TMS320C3x Conversion (Complete Version) 


TITLE IEEE TO TMS320C3x CONVERSION (COMPLETE VERSION) 


SUBROUTINE FMIEEE1 


+ F F F 


FUNCTION: CONVERSION BETWEEN THE IEEE FORMAT AND THE TMS320C3x 
FLOATING-POINT FORMAT. THE NUMBER TO BE CONVERTED 

IS IN THE LOWER 32 BITS OF RO. THE RESULT IS STORED 

IN THE UPPER 32 BITS OF RO. 


x 


UPON ENTERING THE ROUTINE, AR1 POINTS TO THE FOLLOWING TABLE: 


+ £ F FF F F FH 


(0) OxFF800000 <—— ARI 
(1) OxFFOO00000 
(2) 0x7FO000000 
(3) 0x80000000 
(4) 0x81000000 
(5) 0Ox7F800000 
(6) 0x00400000 
(7) OxOO7FFFFF 
(8) OX7F7FFFFF 


+ + F F FF F F F FH 


* 
> 
ve) 
Q 
G 
Ss 


ENT ASSIGNMENTS: 


ARGUMENT FUNCTION 


R TO BE CONVERTED 
AR1 | POINTER TO TABLE WITH CONSTANTS 


+ + + + OF 
ye) 
oO 
Z 
q 
K 
WwW 
eal 


REGISTERS USED AS INPUT: RO, AR1 
REGISTERS MODIFIED: RO, Rl 
EGISTER CONTAINING RESULT: RO 


+ + 4 OF 
o 


NOTE: SINCE THE STACK POINTER SP IS USED, MAKE SURE TO 
INITIALIZE IT IN THE CALLING PROGRAM. 


CYCLES: 23 (WORST CASE) WORDS: 34 


-global FMIEEE1 


FMIEEE1 LDI RO,R1 

AND *+AR1(5),R1 

BZ UNNORM : If e = 0, number is either 0 or 
* 7 denormalized 

XOR *+AR1(5),R1 

BNZ NORMAL : If e < 255, use regular routine 
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Example 3-12. IEEE-to-TMS320C3x Conversion (Complete Version) (Continued) 


i HANDLE NaN AND INFINITY 
TSTB *+AR1 (7) ,RO 
RETSNZ ; Return if NaN 
LDI RO, RO 
LDFGT *+AR1(8),RO H If positive, infinity = 
: most positive number 
LDFN *+AR1(5),RO ; If negative, infinity = 
RETS ; most negative number RETS 


= 


* HANDLE Os AND UNNORMALIZED NUMBERS 


UNNORM TSTB *+AR1 (6) ,RO ; Is the MSB of f equal to 1? 
LDFZ *+AR1 (3) ,RO ; If not, force the number to 0 
RETSZ H and return 
XOR *+AR1 (6) ,RO ; If MSB of f = 1, make it 0 
BND NEG1 
LSH 1,R0O ; Eliminate sign bit 
; & line up mantissa 
SUBI *+AR1(2),RO ; Make e = 4127 
PUSH RO 
POPF RO H Put number in floating point format 
RETS 
NEG1 POPF RO 
NEGF RO, RO ; If negative, negate RO 
RETS 


i HANDLE THE R 


C3 


GULAR CASES 


NORMAL AND3 RO, *AR1,R1 ; Replace fraction with 0 
BND NEG 7 Test sign 
ADDI RO,R1 ; Shift sign and exponent inserting 0 
SUBI *+AR1(2),R1 ; Unbias exponent 
PUSH R1 
POPF RO : Load this as a flt. pt. number 
RETS 

NEG POPF RO ; Load this as a flt. pt. number 
NEGF RO, RO ; Negate if original sign negative 
RETS 
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3.7.2 TMS320C3x-to-IEEE Floating-Point Format Conversion 


The majority of the numbers represented by the ’C3x floating-point format are 
covered by the general IEEE format and the representation of Os. The only 
special case is e = —127 in the ’C3x format; this corresponds to a denormal- 
ized number in IEEE format. It is ignored in the fast version but treated properly 
in the complete version. Example 3-13 shows the fast version, and 
Example 3-14 shows the complete version of the ’C3x-to-IEEE conversion. 


Example 3-13. TMS320C3x-to-IEEE Conversion (Fast Version) 


+ + F FF FF F F HF F 


+ + F + F F F 


TITLE TMS320C3x TO IEEE CONVERSION (FAST VERSION) 


SUBROUTINE TOIEEE 


r 


FUNCTION: CONVERSION BETWEEN THE TMS320C3x FORMAT AND THE IE 
FLOATING-POINT FORMAT. THE NUMBER TO BE CONVERTED 


IS IN THE UPPER 32 BITS OF RO. THE RESULT WILL BE IN 


THE LOWER 32 BITS OF RO. 


UPON ENTERING THE ROUTINE, AR1 POINTS TO THE FOLLOWING TABLE: 


(0) OxFF800000 <—— ARI 
(1) OxFFOO00000 
(2) 0x7FO000000 
(3) 0x80000000 
(4) 0x81000000 


ARGUMENT ASSIGNMENTS: 

ARGUMENT FUNCTION 

——— — — 4-—---—-------—-—------------------------ 
RO | NUMBER TO BE CONVERTED 

AR1 | POINTER TO TABLE WITH CONSTANTS 


REGISTERS USED AS INPUT: RO, AR1 
REGISTERS MODIFIED: RO 
REGISTER CONTAINING RESULT: RO 


NOTE: SINCE THE STACK POINTER ‘SP’ IS USED, MAKE 
INITIALIZE IT IN THE CALLING PROGRAM. 


SURE TO 
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Example 3-13. TMS320C3x-to-IEEE Conversion (Fast Version) (Continued) 


CYCLI 


ES: 


14 (WORST CASE) 


OP 


SH 


TOIEEE 


RO, RO 
*+AR1(4),RO 
NEG 

RO 

1,R0 

RO 

RO 
*+AR1(2),RO 
+1,R0 


RO 


*+AR1(2),RO 
+1,R0 
*+AR1 (3),RO 


15 


Determine the sign of the number 

If 0, load appropriate number 

Branch to NEG if negative (delayed) 
Take the absolute value of the number 
Eliminate the sign bit in RO 


Place number in lower 32 bits of RO 
Add exponent bias (127) 
Add the positive sign 


Place number in lower 32 bits 
of RO 

Add exponent bias (127) 

Make space for the sign 

Add the negative sign 
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Example 3-14. TMS320C3x-to-IEEE Conversion (Complete Version) 


+ + F F FF F F FF F F OF 


+ 


+ + F F F FF F HF FH 


+ 


+ + F F 


+ £ F F F F F 


+ 


TITLE TMS320C3x TO IEEE CONVERSION 


SUBROUTINE 


FUNCT 
FLOAT 
IS IN THE 
IN THE 


UPON ENT 


TOIE 


EEL 


TION: CONVE 
TING-POINT FORMAT. 


:3 


UPP 


ER 32 


LOW 


(COMPLETE VERSION) 


RSION BETWEEN 


THE TMS320C3x FORMAT AND THE IE 


THE 
BITS OF 


ER 32 BITS OF RO. 


RGUMENT 


RGUMENT | 


OxFF800000 
OxFF0O00000 
0x7F000000 
0x80000000 
0x81000000 
0x7F 800000 
0x00400000 
OxOO7FFFFF 
Ox7F7FFFFF 


ERING THE ROUTINE, 


<-—— ARI 


ASSIGNMENTS: 


FUNCTION 


NUMBER TO BE 
RO. 


CONVERTED 
ULT WILL BE 


THE 


RES 


AR1 POINTS TO THE FOLLOWING TABLE: 


A 


ve) 


ve) 


CYCLES: 


-global 


RO 


+ 


R1 


NUMBE 


R TO BE CONVERTED 


POINTER TO TABLE 


EGISTERS U 
EGISTERS MO 


ra 


Ss 


EGIST 


NOTE: 


SINCE 


DIFIED: 


THE STACK POINTER 


ED AS INPUT: RO, 


RO 


INITIALIZE IT IN THE 


Si: 


(WO 


TOIEE 


El 


RST CASI 


fl 


) WORDS: 


WITH CONSTANTS 


AR1 


ER CONTAINING RESULT: RO 


‘SP’ IS US 


ED, MAKE 


SURE TO 


CALLING PROGRAM. 
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Example 3-14. TMS320C3x-to-IEEE Conversion (Complete Version) (Continued) 


TOIE 


CONT 


2 


n 


DrHerprpwvu Drpwvh ow 


nN E 


HUOAGANOCE 


AUNUDWO 


ww 
mM Ww 
N 


ae] 


RO, RO 
*+AR1 (4) ,RO 
NEG 

RO 


1,R0 

RO 

RO 
*+AR1(2),RO 
+1,R0 


*+AR1 (5) ,RO 
*+AR1 (7) ,RO 


RO 

RO 

+1,R0 

RO 

RO 

*+AR1 (6) ,RO 


RO 

CONT 

*+ARI (2),RO 
+1,R0 

*+AR1 (3),RO 


Determine the sign of the number 
If 0, load appropriate number 
Branch to NEG if negative (delayed) 
Take the absolute value 

of the number 
Eliminate the sign bit in RO 


Place number in lower 32 bits of RO 
Add exponent bias (127) 
Add the positive sign 


If e > 0, return 


Ife=06&f=0, 


Shift f right by one bit 


Add 1 to the MSB of f 


Place number in lower 32 bits of RO 


Add exponent bias (127) 
Make space for the sign 
Add the negative sign 
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Memory Interfacing 


The ’C3x interfaces connect to many device types. Each of these interfaces 
is tailored to a particular family of devices. 
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System Configuration 


4.1 System Configuration 


The devices that can be interfaced to the ’C3x include memory, DMA devices, 
parallel and serial peripherals, and I/O devices. Figure 4—1 illustrates a typical 
configuration of a ’C3x system with various external devices and the interfaces 


to which they are connected. 


Figure 4—1. Possible System Configurations 


4-2 


Memory 


Peripherals 


Peripherals 


DMA devices 


t TMS320C3x 


Memory 


Bit I/O 


External DMA interface 


Primary bus 


Interrupt 
interface 


External flags 


Expansion bus 


Timer interface 


Peripherals 


I/O devices 


ae 


This block diagram represents a fully expanded system. In an actual design, you 


System Serial — Serial 
control ports ports 
Soe ane TLO3204x 
reset AIC 

generators, 
etc. analog I/O 


TCM29C13 
codec 


can use any subset of the illustrated configuration that is appropriate. 
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External Interfaces 


The ’C8x interface type depends on the device to which it is to be connected. 
Each interface comprises one or more signal lines that transfer information and 
control its operation. Figure 4-2 shows the signal line groupings for each of 


these interfaces. 


Figure 4-2. External Interfaces on the TMS320C3x 
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All of the interfaces are independent of one another, and you can perform dif- 
ferent operations simultaneously on each interface. 


The primary and expansion buses implement the memory-mapped interface 
to the device. The external direct memory access (DMA) interface allows ex- 
ternal devices to cause the processor to relinquish the primary bus and allow 


direct memory access. 
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4.3. Primary Bus Interface 


4-4 


The 'C8x uses the primary bus to access the majority of its memory-mapped 
locations. When a large amount of external memory is required in a system, it 
is interfaced to the primary bus. The C30 expansion bus (discussed in the Ex- 
ternal Memory Interface chapter of the TMS320C3x User's Guide) actually 
comprises two mutually exclusive interfaces, controlled by the MSTRB and 
IOSTRB signals. Cycles on the expansion bus that are controlled by the MSTRB 
signal are equivalent to cycles on the primary bus, except that bank switching 
is not implemented on the expansion bus. Accordingly, the discussion of primary 
bus cycles in this section applies equally to MSTRB cycles on the expansion 
bus. 


Although you can use both the primary bus and the expansion bus to inter- 
face to a wide variety of devices, those most commonly interfaced to these 
buses are memory devices. This section presents detailed examples of 
memory interface. 


Zero-Wait-State Interface to Static RAMs 


4.4 Zero-Wait-State Interface to Static RAMs 


Zero-wait-state read access time for the ’C3x is determined by the difference 
between the cycle time and the sum of the delay time for the interface signal 
H1 low to address valid and the data setup time before the next H1 low. (For 
more information, see the appropriate TMS320C3x Digital Signal Processor 
data sheet.) 


te(H) ~ tachi — ay * teuoyr | 


where: 
tc(H) = H1/H3 cycle time 
td(H1L — A) = H1 low to address valid 
tsu(D)R = data valid before next H1 low (read) 


For example, for full-speed, zero-wait-state interface to any device, the 60-ns 
’°C8x requires a read access time of 30 ns from address valid to data valid. For 
most memories, access time from achip-select pin is the same as access time 
from address valid; therefore, it is possible to use 30-ns memories at full speed 
with the 'C3x-33. This requires that there are no delays between the processor 
and the memories. However, because of interconnection delays and because 
some gating is normally required for chip-select generation, this is usually not 
the case. Slightly faster memories are required in most systems. 


There are two distinct categories among currently available RAMs: 


CV RAMs without output enable (OE) control lines, which include the 
1-bit-wide organized RAMs and most of the 4-bit-wide RAMs 


CY RAMs with OE controls, which include the byte-wide RAMs and a few of 
the 4-bit-wide RAMs 


Many of the fastest RAMs do not provide OE control; they use chip-select 
(CS)-controlled write cycles to ensure that data outputs do not turn on for write 
operations. In CS-controlled write cycles, the write control line (WE) goes low 
before CS goes low, and internal logic holds the outputs disabled until the cycle 
is completed. Using CS-controlled write cycles is an efficient way to interface 
fast RAMs without OE controls to the ’C30 at full speed. 
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In the case of RAMs with OE controls, using this signal can add flexibility to 
many systems. Additionally, many of these devices can be interfaced by using 
CS-controlled write cycles with OE tied low, in the same manner as with RAMs 
without OE controls. There are, however, two requirements for interfacing to 
OE RAMs in this manner: 


(J The RAM’s OE input must be gated internally with the chip-select pin and 
WE so that the device’s outputs do not turn on unless a read is being per- 
formed. 


17 The RAM must allow its address inputs to change while WE is low; some 
RAMs specifically prohibit this. 


Figure 4—3 shows the ’C3x interface to Cypress Semiconductor’s CY7C186 
25-ns 8K x 8-bit CMOS static RAM with the OE control input tied low and a 
CS-controlled write cycle. 


Zero-Wait-State Interface to Static RAMs 


Figure 4-3. TMS320C3x Interface to Cypress Semiconductor’s CY7C 186 CMOS SRAM 
4 x CY7C186-25 
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In this circuit, the two chip-select pins on the RAM are driven by the STRB and 
A23 pins, which are ANDed together internally. A23 locates the RAM at ad- 
dresses 00000h through 03FFFh in external memory, and STRB establishes 
the CS-controlled write cycle. The WE control input is then driven by the ’C3x 
R/W signal. The OE input is not used and is connected to ground. 
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The timing of read operations, shown in Figure 4—4, is very straightforward 
because the two chip-select inputs are driven directly. The read access time 
of the circuit is the inverter propagation delay added to the RAM’s chip-select 
access time (ty + tp =5 +25 = 30 ns). This access time meets the ’C3x-33’s 
specified 30-ns read access time requirement. 


Figure 4-4. Read Operations Timing 
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During write operations, shown in Figure 4—5, the RAM’s outputs do not turn 
on at all, because of the chip-select controlled write cycles. The chip-select 
controlled write cycles are generated because R/W goes active (low) before 
the STRB term of the chip-select input. Because the RAM’s output drivers are 
disabled whenever the WE input is low (regardless of the state of the OE input), 
bus conflicts with the ‘C3x are automatically avoided with this interface. The 
circuit's data setup and hold times (ty and to in Figure 4—5) of approximately 
50 ns and 20 ns easily meet the RAM’s minimum timing requirements of 10 ns 
and Ons. 


Figure 4—5. Write Operations Timing 
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Zero-Wait-State Interface to Static RAMs 


If you require more complex chip-select decode than can be accomplished in 
time to meet zero-wait-state timing, you can use wait states (see section 4.5, 
Wait States and Ready Signal Generation) or bank-switching techniques (see 
section 4.5.6). 


The CY7C186 SRAM's OE control is gated internally with a CS pin; the RAM’s 
outputs are not enabled unless the device is selected. This is critical if there 
are any other devices connected to the same bus. If there are no other devices 
connected to the bus, OE does not need not to be gated internally with a chip- 
select pin. 


To interface RAM without OE controls to the ’C3x with a single memory bank 
and no other devices present on the bus, connect the memory’s CS input to 
STRB directly. If several devices must be selected, an additional gate is re- 
quired to AND the device select and STRB pins in order to drive the CS input 
that generates the chip-select controlled write cycles. In either case, the WE 
input is driven by the ’C3x R/W signal. If sufficient fast gating is used, 25-ns 
RAMs can be used. 


As with RAM with OE control lines, this approach works well only if a few banks 
of memory are implemented and if the chip-select decode can be accom- 
plished with only one level of gating. If many banks are required to implement 
very large memory spaces, bank switching can be used to provide for multiple 
bank select generation and still maintain full-speed accesses within each 
bank. Bank switching is discussed in detail in section 4.5.6 on page 4-15. 
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4.5 Wait States and Ready Signal Generation 


4.5.1 
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Wait states can greatly increase system flexibility and reduce hardware 
requirements. The ’C3x can generate wait states on either the primary bus or 
the expansion bus; both buses have independent sets of ready control logic. 
This section discusses ready signal generation from the perspective of the 
primary bus interface. However, since wait-state operation on the expansion 
bus is similar to that on the primary bus, these discussions also pertain to 
expansion bus operation. Ready signal generation is not included in 
discussions of the expansion bus interface. See the 7MS320C3x User's Guide 
for more information. 


Wait states are generated on the basis of the: 


[j Internal wait-state generator 
[1 External ready input (RDY) 
_j Logical AND or OR of the two 


When enabled, internally generated wait states affect all external cycles, 
regardless of the address accessed. If different numbers of wait states are 
required for various external devices, the external RDY input may be used for 
wait-state generation to specific system requirements. 


If the logical AND (electrical OR) of the wait count and external ready signals 
is selected, the latter of the two signals controls the internal ready signal. Both 
signals must occur. Accordingly, external ready control must be implemented 
for each wait-state device, and the wait count ready signal must be enabled. 


If the logical OR (or electrical AND, since the signals are low true) of the exter- 
nal and internal wait-count ready signals is selected, the earlier of the two sig- 
nals generates a ready condition and allows the cycle to be completed. Both 
signals do not need to be present. 


ORing the Ready Signals 


Performing an OR of the two ready signals can implement wait states for de- 
vices that require a greater number of wait states than are implemented with 
external logic (up to seven). This is useful, for example, if a system contains 
both fast and slow devices. In this case, fast devices can externally generate 
a ready signal with a minimum of logic, and slow devices can use the internal 
wait counter for larger numbers of wait states. When fast devices are ac- 
cessed, the external hardware responds promptly with a ready signal that ter- 
minates the cycle. When slow devices are accessed, the external hardware 
does not respond and the cycle is terminated after the internal wait count. 


Wait States and Ready Signal Generation 


You can perform an OR of the two ready signals if conditions require the ter- 
mination of bus cycles before the number of wait states implemented when ex- 
ternal logic takes place. In this case, the wait count that is specified internally 
is shorter than the number of wait states implemented with the external ready 
logic, and the bus cycle is terminated after the wait count. This technique can 
also safeguard against inadvertent accesses to nonexistent memory that 
would never respond with a ready signal and would lock up the ’C3x. 


If an OR of the two ready signals is used and the internal wait-state count is 
less than the number of wait states implemented externally, the external ready 
generation logic resets its sequencing to allow a new cycle to begin immediate- 
ly following the end of the internal wait count. This requires that consecutive 
cycles come from independently decoded areas of memory and that the exter- 
nal ready generation logic restarts its sequence as soon as anew cycle begins. 
Otherwise, the external ready generation logic can lose synchronization with 
bus cycles and generate improperly timed wait states. 


4.5.2 ANDing the Ready Signals 


Performing an AND of the two ready signals can implement wait states for de- 
vices that are equipped to provide a ready signal but cannot respond quickly 
enough to meet the ’C3x’s timing requirements. Specifically, if these devices 
normally indicate a ready condition and respond, when accessed, with a wait 
state until they are ready, using the logical AND of the two ready signals lowers 
the chip count in the system. In this case, the internal wait counter provides 
wait states initially and becomes ready after the external device has had time 
to send a not ready indication. The internal wait counter then remains ready 
until the external device also becomes ready, which terminates the cycle. 


In addition, performing an AND of the two ready signals can extend the number 
of wait states for devices that already have external ready logic implemented 
but require additional wait states under certain circumstances. 


4.5.3 External Ready Signal Generation 


The technique for implementing external ready generation hardware depends 
on the characteristics of the system. The optimum approach to ready signal 
generation varies, depending on the relative number of wait-state and non- 
wait-state devices in the system and on the maximum number of wait states 
required for any one device. The approach discussed here is general enough 
for most applications and can easily be modified and applied to many different 
system configurations. 


Memory Interfacing 4-11 


Wait States and Ready Signal Generation 


4-12 


Ready signal generation involves the following steps: 
1) Segmenting the address space to distinguish fast and slow devices 
2) Generating properly timed ready indications 


3) Logically ORing all of the separate ready timing signals together to con- 
nect to the physical ready input 


Segmenting the address space, which is commonly performed by chip-select 
generation, is required to obtain a unique indication of each area within the 
address space that requires wait states. You can use chip-select signals to 
initiate wait states; however, chip-select decoding considerations may 
occasionally provide signals that do not meet ready input timing requirements. 
In this case, you can use a small number of address lines to segment coarse 
address space. The simpler gating allows signals to be generated more 
quickly. In either case, the signal that indicates a particular area of memory is 
being addressed normally initiates a ready or wait-state indication. 


Once the region of address space being accessed has been established, a 
timing circuit provides a ready indication to the processor at the appropriate 
point in the cycle. 


Finally, since indications of ready status from multiple devices are typically 
present, the signals are logically ORed by using a single gate to drive the RDY 
input. 
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4.5.4 Ready Control Logic 


You can take one of two basic approaches to implement ready control logic, 
depending on the state of the ready input between accesses: 


(1 If RDY is low between accesses, the processor is always ready unless a 
wait state is required. 


Control of full-speed devices is straightforward; no action is necessary be- 
cause the ready signal is always active unless otherwise programmed. 
Devices requiring wait states, however, must drive ready high fast enough 
to meet the input timing requirements. Then, after an appropriate delay, a 
ready indication must be generated. This can be difficult in many circum- 
stances, because wait-state devices are inherently slow and often require 
complex select decoding. 


(1 If RDY is high between accesses, the processor enters a wait state unless 
a ready indication is generated. 


Zero-wait-state devices, which tend to be inherently fast, can usually re- 
spond immediately with a ready indication. Wait-state devices can delay 
their select signals to generate a ready indication. Typically, this approach 
results in the most efficient implementation of ready control logic. 
Figure 4-6 shows a circuit of this type, which can be used to generate 
zero, one, or two wait states for multiple devices in a system. 
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Figure 4—6. Circuit for Generation of Zero, One, or Two Wait States for Multiple Devices 
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In the circuit in Figure 4—6, full-speed devices drive ready signals directly 
through the ’74AS21 NOR gate, and the two flip-flops delay wait-state devices’ 
select signals one or two H1 cycles to provide one or two wait states. 


Considering the 'C3x-33’s ready signal delay time of 8 ns following the ad- 
dress, zero-wait-state devices must use ungated address lines directly to drive 
the input of the ’74AS21, since this gate contributes a maximum propagation 
delay of 6 ns to the RDY signal. Zero-wait-state devices must be grouped to- 
gether within a memory address range if other devices in the system require 
wait states. 


With this circuit, devices requiring wait states might take up to 36 ns to provide 
inputs to the °74AS20 OR gate’s inputs from a valid address on the ’C3x. This 
usually allows sufficient time for any decoding required in generating select 
signals for slower devices in the system. For example, the 74ALS138 multi- 
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plexer, driven by the address bus and STRB pin, can generate select decodes 
in 22 ns, which easily meets the ’C3x-33’s timing requirements. 


With this circuit, unused inputs to either the *74AS20 OR gates or the ’74AS21 
NOR gate must be tied to a logic high level to prevent noise from generating 
spurious wait states. 


If more than two wait states are required by devices within a system, other ap- 
proaches can be used for ready signal generation. If between three and seven 
wait states are required, additional flip-flops can be included in the same man- 
ner shown in Figure 4—6, or internally generated wait states can be used in 
conjunction with external hardware. If more than seven wait states are re- 
quired, an external circuit using a counter can be used to supplement the capa- 
bilities of the internal wait-state generators. 


4.5.6 Bank-Switching Techniques 


The ’C3x’s programmable bank-switching feature can greatly ease conflicts 
on system design circuits when large amounts of memory are required. Nor- 
mally, devices take longer to release the bus than they take to drive the bus; 
bank switching provides a period of time for disabling all device selects that 
are not present otherwise. During this interval, slow devices are allowed time 
to turn off before other devices have the opportunity to drive the data bus, thus 
avoiding bus contention. (See the TMS320C3x User’s Guide for further infor- 
mation on bank switching.) 


When aportion of the high order address lines changes (as defined by the con- 
tents of the BNKCMPR register) and bank switching is enabled, STRB goes 
high for one full H1 cycle. If STRB is included in chip-select decodes, this 
causes all devices to be disabled during this period. The next bank of devices 
is not enabled until STRB goes low again. 


In general, bank switching is not required during writes because write cycles 
always exhibit an inherent one-half H1 cycle setup of address information be- 
fore STRB goes low. When you use bank switching for read/write devices, a 
minimum of one-half H1 cycle of address setup is provided for all accesses. 
Therefore, large amounts of memory can be accessed without requiring wait 
states or extra hardware for isolation between banks. Access time for cycles 
with bank switching is the same as that for cycles without bank switching. Ac- 
cordingly, full-soeed accesses can still be accomplished within each bank. 
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When you use bank switching to implement large multiple-bank memory sys- 
tems, you must consider address line fanout/loading. Besides parametric 
specifications which must be accounted for, ac characteristics are crucial in 
memory system design. With large memory arrays, which commonly require 
large numbers of address line inputs to be driven in parallel, capacitive loading 
of address outputs is often quite large. Because all 'C3x timing specifications 
are guaranteed up to a capacitive load of 80 pF, using greater loads invalidates 
guaranteed ac characteristics. It is often necessary to provide buffering for ad- 
dress lines when using large memory arrays. The ac timing specifications for 
buffer performance can then be derated according to manufacturer specifica- 
tions to accommodate a wide variety of memory array sizes. 


The circuit shown in Figure 4—7 illustrates the use of bank switching with 
Cypress Semiconductor’s CY7C185 25-ns 8K x 8-bit CMOS static RAM. This 
circuit implements 32K 32-bit words of memory with one-wait-state accesses 
for each bank. 


The bank memory requires a wait state with this implementation because of 
the added propagation delay presented by the address bus buffers used in the 
circuit. The wait state is not a function of the memory organization of multiple 
banks or the use of bank switching. Memory access speeds are the same with 
and without bank switching, once bank boundaries are crossed. No speed 
penalty is incurred by using bank switching, except for the occasional extra 
cycle inserted when bank boundaries are crossed. If this extra cycle impacts 
software performance significantly, you can often restructure code to minimize 
bank boundary crossings and reduce the effect of these boundary crossings 
on software performance. 


The wait state for this bank memory is generated by using the wait-state gener- 
ator circuit described in section 4.5.5 on page 4-14. Because the A23 signal 
enables the entire bank memory system, the inverted version of this signal is 
ANDed with STRB to derive a one-wait-state device select. This signal is then 
connected in the circuit along with the other one-wait-state device selects. Any 
time a bank memory access occurs, one wait state is generated. 
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Figure 4—7. Bank Switching for Cypress Semiconductor’s CY7C 185 SRAM 


Each of the four banks in this circuit is selected by decoding signals A15—A13 
generated by the ’74ALS138 multiplexer (see Figure 4-8). With the 
BNKCMPR register set to OBh, the banks are selected on even 8K-word 
boundaries, starting at location O80A000h in external memory space. 
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Figure 4-8. Bank-Memory Control Logic 
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The 'C3x rated capacitive loading is 80 pF. The ’74ALS254 buffers used on the 
address lines are necessary in this design because the total capacitive load 
presented to each address line is a maximum of 16 x 10 pF or 160 pF (bank 
memory plus zero-wait-state static RAM). Using the manufacturer’s derating 
curves for these devices at a load of 80 pF (the load presented by the bank 
memory) predicts propagation delays at the output of the buffers to amaximum 
of 16 ns. The access time of a read cycle within a bank of the memory is the 
sum of the memory access time and the maximum buffer propagation delay 
(25 + 16 = 41 ns). Since this propagation delay falls between 30 and 90 ns, it 
requires only one wait state on the ’C3x-33. 


The ’74ALS254 buffers offer an additional system-performance enhance- 
ment—they include 25-Q resistors in series with each buffer output. These re- 
sistors greatly improve the transient response characteristics of the buffers, 
especially when driving CMOS loads, such as the memories used here. The 
effect of these resistors is to reduce overshoot and ringing, which are common 
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when driving predominantly capacitive loads, such as for CMOS devices. The 
result is reduced noise and increased immunity in the circuit, which, in turn, 
results in a more reliable memory system. Having these resistors included in 
the buffers eliminates the need to put discrete resistors in the system, which 
is often required in high-speed memory systems. 


This circuit cannot be implemented without bank switching because the data 
output’s turn-on and turn-off delays cause bus conflicts. The propagation delay 
of the *74ALS138 multiplexer is involved only during bank switches, when 
there is sufficient time between cycles to allow new chip-selects to be 
decoded. 


Figure 4-9 shows the timing of this circuit for read operations using bank 
switching. With the BNKCMPR register set to OBh, when a bank switch occurs, 
the bank address on address lines A23—A13 is updated during the extra H1 
cycle while STRB is high. Then, after chip-select decodes have stabilized and 
the previously selected bank has disabled its outputs, STRB goes low for the 
next read cycle. Further accesses occur at normal bus timings with one wait 
state, as long as another bank switch is not necessary. Write cycles do not re- 
quire bank switching because of the inherent address setup provided in their 
timings. This timing is summarized in Table 4—1. 


Figure 4-9. Timing for Read Operations Using Bank Switching 
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Table 4—1. Bank-Switching Interface Timing for the TMS320C3x-33 


t1 H1 falling to address valid/STRB rising 14ns 
t2 Address valid to select delay 10 ns 
3 Memory disable from STRB 10 ns 
t4 H1 falling to STRB 10 ns 
t5 STRB to select delay 4.5 ns 
t6 Memory output enable delay 3ns 
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4.6 Interfacing Memory to the TMS320C32 DSP 


The 'C32 accesses external memory with one 24-bit address bus, one 32-bit 
data bus, and three strobes: IOSTRB, STRBO, and STRB1. The strobes are 
mapped to selected portions of the memory map as shown in Figure 4—10 on 
page 4-23. For example, if the CPU is reading data from location 881234h, the 
active strobe during the read bus cycle is STRBO. Unlike the other two strobes, 
STRBO is assigned to two noncontiguous address spaces within the memory 
map to provide extra flexibility in address decoding for glueless memory inter- 
faces. 


The behavior of IOSTRB is similar to that of its counterpart in the ’C30. Its tim- 
ing characteristics are slightly relaxed in comparison with STRBO and STRB1 
cycles to better accommodate slower I/O peripherals. In contrast to STRBO 
and STRB1, IOSTRB uses a single signal line and accesses the external data 
one full 32-bit word at a time. STRBO and STRB1 are composed of four signal 
lines each. The multiple signal lines per strobe enable the STRBO and STRB1 
cycles to access external memory one byte, one half-word, or one full word at 
atime. For example, to read a single byte from a 32-bit-wide external memory 
location mapped to STRBO, the address on the address bus points to the se- 
lected 32-bit word and only one STRBO signal is activated (driven low) to select 
the desired byte. To access two bytes of data at the memory location mapped 
to STRB1, two STRB1 signal lines are asserted during the bus cycle. Full 
32-bit bus cycles involving STRBO or STRB1 memory space result in four 
strobe signals simultaneously accessing four bytes of data. The 32-bit STRBO 
and STRB1 bus cycles are no different functionally from the IOSTRB cycles 
but simply have tighter timing parameters. 


The STRBO and STRB1 cycles are not limited to just selecting bytes out of 
32-bit memory locations. There are two strobe control registers that configure 
the data size and memory width for STRBO and STRB1 bus cycles (one control 
register per strobe). With proper initialization of the strobe control registers, the 
bus cycles can be configured to encompass any combination of data size and 
physical memory width. For example, a byte can be read from a 16-bit-wide 
memory or a 32-bit word can be written to an 8-bit-wide memory by configuring 
the memory width and data size fields of the corresponding strobe control reg- 
isters (see Figure 4—10). 


Like other members of the ’C3x generation, the ‘C32 program, as well as the 
data, can reside in any portion of the memory map. The ’C32 program fetches 
from address space mapped to IOSTRB are indistinguishable from IOSTRB 
data reads or writes. However, the STRBO and STRB1 cycles are configured 
slightly differently for program fetches than for data accesses. Program and 
data can still share the same portions of the memory map, but instead of set- 
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ting the memory width and data size fields in STRBO and STRB1 control regis- 
ters, the program fetch cycles from the memory spaces mapped to STRBO and 
STRB1 are configured by hardwiring the PRGW (program memory width se- 
lect) pin. There is no need to use the data size fields, because all program 
fetches apply only to instruction words that are 32 bits wide. The memory width 
field of the strobe control register is useless at reset, when the processor is 
fetching the reset vector from memory. At that point the strobe control register 
is always configured in the same way, but different systems can have different 
memory widths. The PRGW pin indicates to the memory interface whether the 
program memory is 16 or 32 bits wide. Program memory that is 8 bits wide is 
not supported, because four cycles per instruction degrade the performance 
too much for it to be useful for most applications. 
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Figure 4-10. STRBO and STRB1 Control Registers and the PRGW Pin 
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4.6.1 Functional Description of the Enhanced Memory Interface 


The enhanced memory interface controls all data and program traffic between 
data buses inside the chip and the 32-bit external memory bus as shown in 
Figure 4-10 through Figure 4-13. For any bus cycle involving a logical 
memory address range mapped to IOSTRB, the memory interface simply con- 
nects the external data bus with an appropriate internal data bus without fur- 
ther data manipulation. 


The memory interface is much busier when the ’C32 is accessing logical 
memory addresses mapped to STRBO and STRB1. Depending on the data 
size and external memory width (as defined by corresponding strobe control 
registers), data can be packed, unpacked, truncated, or shifted on its way to 
and from the chip. 


Section 4.6.1.1 through section 4.6.1.4 illustrate how the data is manipulated 
when the interface has to match variable-size data with 8-, 16-, and 32-bit-wide 
physical memories. In these sections, five lines of code are included in the pro- 
gram space in each figure: 


LDI 4,RC 


RPTB Ll 


LDI *ARO++, RO 
FLOAT RO,R1 
Ll STF Rl, *AR1++ 
These lines of code read five integers from one data space, convert them to 
floating-point format, and write them to another memory space that is assigned 


to a different strobe. Each example has a different combination of data sizes 
and external memory widths to illustrate the range of possible combinations. 


For data access and program fetch cycles in which the data size exceeds the 
physical memory width, the least significant bytes/half-words are always 
transferred first. 
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4.6.1.1 STRBO and STRB1 Data Access: Data Size = Memory Width 


In the case of STRBO and STRB1 data access, where data size equals 
memory width, the data size and memory width for STRBO and STRB1 data 
access cycles are configured in the corresponding strobe control registers 
(see Table 4—2). 


The short program stored in the internal RAMO memory begins with the load 
integer (LDI) instruction reading an 8-bit integer from 8-bit-wide STRBO 
memory (see Figure 4—11). As the integer data passes through the memory 
interface, it is sign extended to 32 bits and loaded to RO as a 32-bit integer. 
Next, the integer-to-floating-point conversion (FLOAT) instruction converts the 
integer in RO to a 40-bit floating-point number and loads it into R1. Finally, the 
store floating-point value (STF) instruction truncates the 40-bit contents of R1 
to 32 bits and stores it in the 16-bit-wide STRB1 memory. As the data passes 
through the memory interface, the 24-bit mantissa is truncated to eight bits (the 
8-bit exponent remains unmodified). 


Table 4-2. STRBO and STRB1 Data Access: Data Size = Memory Width 


Data Access Strobe Data Size Memory Width 
Input data STRBO 8 8 
Output data STRB1 16 16 
Program RAMO 32 32 
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Figure 4—11.STRBO and STRB1 Data Access: Data Size = Memory Width 
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4.6.1.2 STRBO and STRB1 Data Access: Data Size + Memory Width 


The input and/or output data does not have to be the same size as the memory 
it is being read to or written from (see Table 4—3). The data size and memory 
width for STRBO and STRB1 data access cycles are configured in the corre- 
sponding strobe control registers. 


The short program stored in the RAM1 memory begins with the LDI instruction 
reading an 8-bit integer from 16-bit-wide STRBO memory (see Figure 4—12). 
Since each address contains two data bytes, the memory interface uses differ- 
ent STRBO lines to differentiate between the high byte and the low byte. (Both 
STRBO and STRB1 comprise four signals each, one for each byte of the 32 
bits.) Next, the FLOAT instruction converts the integer in RO to a 40-bit floating- 
point number and loads it to R1. Finally, the STF instruction stores the contents 
of R1 to 16-bit-wide memory as a 32-bit number. Before the data arrives at the 
memory interface, the 32-bit mantissa is truncated to 24 bits (the 8-bit expo- 
nent remains unmodified). The memory interface then stores the 24-bit man- 
tissa and the 8-bit exponent in 16-bit-wide memory, two bytes at a time, using 
two cycles and two physical memory addresses. 


Table 4-3. STRBO and STRB1 Data Access: Data Size # Memory Width 


Data Access Strobe Data Size Memory Width 
Input data STRBO 8 16 
Output data STRB1 32 16 
Program RAM1 32 32 
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4.6.1.3 Program Fetch From 16-Bit STRBO Memory 


Table 4—4 shows program memory mapped to 16-bit-wide STRBO or STRB1 
memory. By hardwiring the PRGW pin to a high state, 32-bit data transfers to 
and from the 32-bit-wide external memory do not involve any data operations 
in the memory interface. 


The short program stored in STRBO memory begins with the LDI instruction 
reading a 32-bit integer from 32-bit-wide IOSTRB memory and loading it to RO 
(see Figure 4—13). Next, the FLOAT instruction converts the integer in RO to 
a 40-bit floating-point number and loads it into R1. Finally, the STF instruction 
truncates the 40-bit contents of R1 to 32 bits and stores it in the 32-bit-wide 
STRB1 memory. The data is not modified as it passes through the memory in- 
terface. 


The program controlling the data conversion in this example is stored in the 
32-bit-wide memory bank mapped to STRBO. As discussed earlier, program 
fetch cycles do not reference the strobe control register to determine the width 
of the program memory. Instead, the memory interface checks the state of the 
PRGW pin to determine the memory width. Because the program memory is 
16 bits wide, the PRGW pin should be pulled up to Vcc, effectively directing 
the memory interface to fetch instructions in two bus cycles per instruction (16 
bits at a time). 


Table 4-4. Program Fetch From 16-Bit STRBO Memory 


Data Access Strobe Data Size Memory Width 
Input data STRBO 32 32 
Output data STRB1 32 32 
Program iOSTRB 32 16 
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Figure 4-13. Program Fetch From 16-Bit STRBO Memory 
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4.6.1.4 Program Fetch From 32-Bit STRB1 Memory 


Table 4—5 shows program memory mapped to 32-bit-wide STRBO or STRB1 
memory. By hardwiring the PRGW pin to a low state, 32-bit data transfers to 
and from the 32-bit-wide external memory do not involve any data operations 
in the memory interface. 


The small program stored in STRB1 memory begins with the LDI instruction 
reading a 32-bit integer from 32-bit-wide STRBO memory and loading it into 
RO (see Figure 4—14). Next, the FLOAT instruction converts the integer in RO 
to a 40-bit floating-point number and loads it into R1. Finally, the STF instruc- 
tion truncates the 40-bit contents of R1 to 32 bits and stores it in the 32-bit-wide 
IOSTRB memory. The data is not modified as it passes through the memory 
interface. 


The program controlling the data conversion in this example is stored in the 
32-bit-wide memory bank mapped to STRB1. Program fetch cycles do not ref- 
erence the strobe control register to determine the width of the program 
memory. Instead, the memory interface checks the state of the PRGW pin to 
determine the memory width. Because the program memory is 32 bits wide, 
the PRGW pin should be grounded, effectively directing the memory interface 
to fetch instructions in one bus cycle per instruction (32 bits at a time). 


Table 4-5. Program Fetch From 32-Bit STRB1 Memory 


Data Access Strobe Data Size Memory Width 
Input Data STRBO 32 32 
Output Data STRB1 32 32 
Program iOSTRB 32 32 
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Figure 4-14. Program Fetch From 32-Bit STRB1 Memory 
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4.6.2 Logical Versus Physical Address 


The ’C82 is a 32-bit processor. Its instruction set operates on 32-bit registers; 
the CPU alone does not read 8- or 16-bit data or data transfers. When a ’C32 
instruction writes to a physical address, it sends all 32 bits of data to the 
memory interface unit through an internal bus. Itis only in the memory interface 
that the internal 32-bit data can assume 8-bit or 16-bit form, provided that the 
address is in the STRBO or STRB1 range of the memory map. The data size 
field of the STRBO or STRB1 control register determines the actual size of the 
data portion that is placed on the external memory bus of the ’C32. Likewise, 
when a ’C282 instruction reads a portion of data from external memory, the 
memory interface always converts it to 32 bits as it enters the chip. What hap- 
pens to the external data as it goes through the memory interface on the way 
to the CPU depends on the contents of the STRBO and STRB1 control regis- 
ters. Again, only the data whose address falls within the STRBO or STRB1 
range of the memory map can be manipulated inside the memory interface 
unit. 


Throughout this document, the term logical address applies to a memory loca- 
tion that is referenced by ‘C32 instructions; the logical address is a part of the 
processor’s logical memory map. The physical address refers to the address 
that appears at the ’C32 address pins. The valid ranges of the logical memory 
map that the program instructions can reference are determined by: 


Lj The external memory available in the system 


[1 The manner in which the external memory address pins are matched with 
the ‘C32 address pins (which depends on physical memory width) 


Li The contents of the STRBO and STRB1 registers (which define physical 
memory width and the data size) 


The logical memory map shown in Figure 4—15 always contains 32-bit data as 
far as the CPU is concerned. It is only when the data passes through the 
memory-interface block that the data size can actually change to 8 or 16 bits, 
as directed by the appropriate strobe control register. For example, when the 
processor reads a byte (eight bits) from external memory, the 8-bit data is sign- 
extended or padded with Os as it passes through the memory interface so that 
it becomes 32-bit data inside the ’C32. Likewise, when the processor writes 
the contents of a 32-bit register to 16-bit-wide external memory, the internal 
32-bit data is truncated to 16 bits as it passes through the memory interface. 
The dashed lines inside the logical memory map in Figure 4—15 show the inter- 
nal 32-bit representation of the external data that has a physical size of 8 or 
16 bits. 


Figure 4—15 explains logical/physical addresses and other terms related to the 
‘C32 memory interface. 
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Figure 4-15. Description of Terms Involved In TMS320C32 Memory Interface 
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4.6.3 32-Bit Memory Configuration Design Examples 


The following sections describe examples of interfacing the ‘C32 to 32-bit- 
wide external memory from both the hardware and software-addressing view- 
points. 


4.6.3.1 32-Bit Memory Address Translation for Data Size = Memory Width 


When both data size and memory width are 32 bits, the STRBO memory inter- 
face behaves like the IOSTRB memory interface. The only difference between 
the two is the number of strobe lines connected to the respective memory 
banks: four for STRBO and one for IOSTRB. 


Figure 4-16 is a schematic diagram of a 32-bit interface consisting of two 
memory banks, each controlled by a separate strobe. The four signal lines of 
STRBO are assigned to the chip-select pins of four 32K x 8 15-ns SRAMs. The 
single IOSTRB signal line is connected to the chip-enable pins of four 
32K x 8 30-ns EPROMs. For the 60-MHz version of the ’C32, the 15-ns 
SRAMs operate with zero wait states and the 30-ns EPROMs require one wait 
state. (Software wait states can be programmed in the strobe control regis- 
ters.) 


The hardware memory configuration is depicted in Figure 4-16. Figure 4-17 
illustrates the programmer’s view of the hardware memory configuration. The 
logical addresses (appearing in program instructions) are represented in the 
context of the entire memory map to identify the respective strobes. The physi- 
cal addresses are the values that actually appear at the pins of the processor. 
Since lOSTRB operates exclusively on 32-bit data types, the memory inter- 
face does not modify the address going in and out of the CPU; the logical and 
physical addresses are identical. In this example, STRBO also operates on 
32-bit data since the memory width field of the STRBO control register contains 
a binary value of 11. Since the STRBO physical memory width is also 32 bits 
(see data size field in Figure 4—17), there is no need for address translation 
from the logical address to its physical representation. 
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Figure 4—16. 32-Bit Memory Configuration (STRBO and IOSTRB) 
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Figure 4-17. 32-Bit Memory Configuration (STRBO and IOSTRB) 
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4.6.3.2 32-Bit Memory Address Translation for Data Size < Memory Width 


4-38 


One memory location can store 2 or 4 data values. Therefore, if the data re- 
quires 16 or 8 bits of precision, the effective addressing range of the same 
physical 32-bit memory is doubled or quadrupled by simply changing the data 
size field of the appropriate strobe control register before the transfers begin. 
The logical-to-physical address translation involves a 2-bit address shift if the 
data size is 8 bits and a 1-bit shift if the data size is 16 bits. The memory inter- 
face automatically performs address shifts and the activation of selected ex- 
ternal memory bytes with appropriate strobe control lines (as directed by the 
strobe control registers). 


Figure 4—18 is the schematic diagram of a 32-bit interface consisting of two 
memory banks, each controlled by a separate strobe. The four signal lines of 
STRBO are assigned to the chip-select pins of four 32K x 8 15-ns SRAMs, and 
the four signal lines of STRB1 are connected to the chip-enable pins of four 
32K x 830-ns EPROMs. For the 60-MHz version of the 'C32, the 15-ns SRAMs 
operate at zero wait states and the 30-ns EPROMs require one wait state. 
(Software wait states can be programmed in strobe control registers.) 


Figure 4—19 illustrates the programmer’s view of the hardware memory con- 
figuration depicted in Figure 4—18. The logical addresses (appearing in pro- 
gram instructions) are represented in the context of the entire memory map to 
identify the respective strobes. In this case, the STRBO memory transfers op- 
erate on 16-bit data to and from 32-bit-wide memory, as defined in the STRBO 
control register. STRB1 accesses 8-bit data to and from 32-bit-wide memory, 
as defined by the STRB1 control register. Since two 16-bit data types can fit 
in a single 32-bit-wide memory location referenced by a single physical ad- 
dress, amechanism is needed to distinguish between the 16-bit data portions. 
This is accomplished by using the least significant bit (LSB) of the logical ad- 
dress to activate a different pair of the four STRBO signal lines for each access, 
leaving the second LSB of the logical address to become the LSB of the physi- 
cal address and effectively shifting the logical address by one bit. Similarly, 
STRB1 8-bit data transfers to the 32-bit-wide external memory cause the ad- 
dress to be shifted by two bits, because the two LSBs of the logical address 
are used to select one out of four bytes sharing the same physical 32-bit 
memory location. 


Figure 4—18. 32-Bit Memory Configuration (STRBO and STRB7) 
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4.6.4 16-Bit and 8-Bit Memory Configuration Design Examples 


This section describes how to interface the C32 to both 8- and 16-bit-wide ex- 
ternal memories in the same design from both the hardware and software-ad- 
dressing perspectives. 


Figure 4—20 contains a schematic diagram of the external memory interface 
consisting of two banks, each controlled by a separate strobe. Two of four 
STRBO signal lines are assigned to the chip-select pins of two 32K x 8 15-ns 
SRAMs; one of four STRB1 signals is connected to a chip-enable pin of one 
32K x 8 30-ns EPROM. For the 60-MHz version of the 'C32, the 15-ns SRAMs 
operate at zero wait states and the 30-ns EPROMs require one wait state. 
(Software wait states canbe programmed in strobe control registers.) Any time 
the external memory is less than 32 bits wide, some of the strobe pins switch 
functions and become additional address pins. For 16-bit-wide memory, 
STRBO_B3 becomes A_, ; for 8-bit-wide memory, STRB1_B3 and STRB1_B2 
become A_j and A_», respectively. This is the only external change that differ- 
entiates the 32-bit-wide memory interface from the 16- and 8-bit-wide memory 
interfaces. This feature can be considered transparent to the software pro- 
grammer, except that the programmer must configure the strobe control regis- 
ters appropriately. The memory interface automatically drives the additional 
address lines with correct values, depending on the size of the data being 
transferred. 


The following three sections illustrate how the physical addresses are derived 
from the logical addresses when the data size is equal to, greater than, and 
less than the width of the physical memory. Though address translation is com- 
pletely automatic, these cases provide insight into the range of physical ad- 
dresses actually affected during transfer of 32-, 16-, and 8-bit data. 
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Figure 4-20. 16-Bit and 8-Bit Memory Configuration: A Complete Minimum Design 
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Note: The EPROM is connected for data access (shifted address) and not for boot table access. This system is booted from the serial port (see INT3 signal). 
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4.6.4.1. 16-Bit and 8-Bit Memory Address Translation for Data Size = Memory Width 


As shown in Figure 4—21, when the external memory width matches the size 
of data being transferred, the physical address also matches the logical ad- 
dress with one exception: the physical address is shifted relative to the logical 
address by one bit for 16-bit transfers and by two bits for 8-bit transfers. This 
means that the address bit that would normally be expected on pin AO actually 
appears on pin A_; or A_o. As Figure 4—21 shows, there is one-to-one corre- 
spondence between logical data and its counterpart in physical memory. 
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Figure 4-21. 16-Bit and 8-Bit Memory Address Translation: Data Size = Memory Width 
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4.6.4.2 16-Bit and 8-Bit Memory Address Translation for Data Size > Memory Width 


Figure 4—22 depicts what happens when data is transferred that is larger than 
the physical memory in which it is to reside. As shown by the contents of the 
strobe control registers, STRBO controls transfers of 32-bit data to and from 
16-bit-wide physical memory and STRB1 controls transfers of 16-bit data to 
and from byte-wide memory. When an instruction stores 32-bit data to logical 
address Oh, the memory interface must perform two write cycles to 16-bit-wide 
external memory. These two write cycles involve two consecutive addresses, 
Oh and th. A 16-bit portion of data logically referenced with a single address 
actually requires two physical addresses to be stored in 8-bit-wide physical 
memory (as is the case with the STRB1 transfer shown at the bottom of 
Figure 4—22). To implement these extra bus cycles, the memory interface ap- 
pends an extra address bit to the least significant end of both addresses. As 
in section 4.6.4.1, the LSBs of the STRBO and STRB1 addresses appear at 
pins A_; and A_»o, respectively, because they represent 16- and 8-bit-wide me- 
mories. 


Memory Interfacing 4-45 


Figure 4-22. 16-Bit and 8-Bit Memory Address Translation: Data Size > Memory Width 
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4.6.4.3 16-Bit and 8-Bit Memory Address Translation for Data Size < Memory Width 


The example in Figure 4—23 is, in a way, an inverse of that in Figure 4-22. The 
8-bit data is transferred to and from 16-bit-wide external memory. To put this 
example in perspective, assume that the data transfer is triggered by the fol- 
lowing ‘C32 instruction: ST| RO,@7FFFh. While in RO, the data is sized at 32 
bits, but when it arrives at the memory interface, the STRBO control register 
data size field indicates 8-bit-wide data. So, the 32-bit data is truncated to 8 
bits. The now byte-sized data is transferred to address 7FFFh of the 16-bit- 
wide external memory. In this case, the LSB of the logical address (as refer- 
enced by the instruction) is actually rerouted to control one of the two STRBO 
lines assigned to the 16-bit physical memory. If the LSB is 1 (as in this case), 
STRBO_B1 is asserted during the write cycle. If the LSBis 0, STRBO_BO is as- 
serted during the write cycle. The remaining bits of the original logical address 
are placed on the external address bus starting at pin A_; (because the 
memory width is 16 bits). 


4.6.4.4 Design Considerations 


While designing the external memory interface to the ’C32, a hardware engi- 
neer must remember to match address pin A_, with the AO pin of a 16-bit-wide 
memory, or to match the A_o address pin with the AO pin of a byte-wide 
memory. If the external memory is 32 bits wide, the pins are not shifted relative 
to each other and, therefore, match perfectly at AO. 


When writing code for the ’C32, the programmer does not have to be con- 
cerned about the structure of the physical memory. The programmer must sim- 
ply be aware of the logical memory map and the configuration of the two strobe 
control registers. The C32 memory interface automatically performs all of the 
address translation tasks and byte packing/unpacking necessary to match 
variable-size data with physical memories of different widths; they are con- 
trolled by the data size and memory width fields of the STRBO and STRB1 con- 
trol registers. 
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Figure 4-23. 16-Bit and 8-Bit Memory Address Translation: Data Size < Memory Width 
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4.6.5 One Bank /Two Strobes (32-Bit-Wide Memory) Design Examples 


This section describes how to use two strobes in interfacing the C32 to asingle 
physical bank of memory. Such configuration enables the access to 32-bit pro- 
grams and to two differently sized portions of data out of the same bank of 
memory with no speed penalty. This feature is implemented by internally AND- 
ing STRBO and STRB1 and outputting the combined strobes on STRBO (a total 
of four lines). The one bank/two strobes memory configuration is useful in sys- 
tems in which, for example, the program requiring 32-bit instruction words for 
maximum execution speed operates on data that needs only 16 bits of preci- 
sion (see Figure 4—27 on page 4-56). 


Figure 4—24 is the schematic diagram of a 32-bit-wide external memory con- 
figuration arranged as one bank with two separate logical control strobes shar- 
ing the same STRBO physical signal lines. The four STRBO signals are as- 
signed to the chip-select pins of four 32K x 8 15-ns SRAMs, one signal per 
chip. For the 60-MHz version of the ’C32, the 15-ns SRAMs operate at zero 
wait states. (For slower devices, additional software wait states can be pro- 
grammed in the appropriate fields of the strobe control registers.) Because the 
total memory width is 32 bits, there is no mismatch between the processor’s 
and the memory’s address pins. Therefore, the ’C32 pin AO is matched with 
memory pin AO, A1 is matched with A1, and so on. As mentioned earlier, both 
STRBO and STRB1 signals appear together on the four STRBO control pins. 
This behavior is selected by setting the strobe configuration bit of the STRBO 
control register to 1 (see Figure 4—24). Since both STRBO and STRB1 are 
mapped to different ranges of the logical memory map, the strobe that actually 
appears on the physical STRBO pins depends on the internal address of the 
data/program being accessed. The two strobes effectively split the physical 
memory into two, with the high memory address bit selecting either the STRBO 
or STRB1 address space. For example, if all program instructions are fetched 
from logical addresses 880000h—881000h and all data reads/writes are con- 
fined between 980000h and 981000h, the program fetches are associated 
with STRBO and all data accesses are driven by STRB1 (see Figure 4-10 on 
page 4-23 for strobe/memory mapping). Since the behavior of each strobe is 
determined by a different control register, the program fetches and data reads/ 
writes, in each case, can vary in the number of STRBO lines that are simulta- 
neously driven and in the number of bus cycles required per access. This is 
shown on the following pages. 
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4.6.5.1 One Bank/Two Strobes Address Translation for Data Size = 16 and 8 Bits 


Figure 4—25 illustrates how a single physical block of memory can be split into 
two separate logical halves, one with 16-bit data and the other with 8-bit data. 
The access to each half is controlled by a separate strobe control register with 
corresponding memory width and data size fields. Another STRBO control reg- 
ister field, STRB CONFIG (strobe configuration), is setto 1 to indicate that both 
STRBO and STRB1 are mapped to the same set of four STRBO pins. The high 
memory address pin (in this case, A14) selects between the two halves of the 
memory. For this example, the ‘C32 address pin A17 drives the memory pin 
Al4. 


The state of the A17 bit of the physical address is derived from the logical ad- 
dress (logical as seen by the instruction). The state of the A17 bit also depends 
on the logical/physical address shift as determined by the size of the program/ 
data that is being accessed. In this case, the logical STRBO address range 
drives the physical address bit A17 to 0 (after accounting for a 1-bit address 
shift due to the 16-bit width of the data). Similarly, the logical STRB1 range 
drives the physical address bit A17 to 1 (after accounting for a 2-bit address 
shift due to the 8-bit width of the data). The logical STRBO and STRB1 address 
ranges selected to drive the physical address pin A17 to 0 and 1, respectively, 
must still conform to the logical memory map that assigns fixed blocks of ad- 
dresses to different strobe spaces. 


An STI RO,*ARO instruction (with ARO = 887FFFh) results ina STRBO data ac- 
cess (data size = 16 bits) driving the STRBO_B2 and STRBO_B3 control pins 
to write the contents of the 32-bit register RO into a 16-bit data location in the 
lower half of the external memory addressed by 3FFFh. Similarly, an LDI 
*AR1,R1 instruction (with AR1 = 98FFFFh) results ina STRB1 data access 
(data size = 8 bits) driving the STRBO_B3 control pin (STRB CONFIG = 1) to 
read the contents of an 8-bit data location in the upper half of the external 
memory addressed by 7FFFh to the 32-bit R1 register. The C32 automatically 
performs all address translation; the programmer merely monitors the logical 
memory map and the two strobe control registers. 
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Figure 4-25. One Bank/Two Strobes Address Translation: Data Size = 16 and 8 Bits 
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4.6.5.2. One Bank/Two Strobes Address Translation for Data Size = 32 and 8 Bits 


Figure 4—26 illustrates how a single physical block of memory can be split into 
two separate logical halves, one with 32-bit data and the other with 8-bit data. 
The access to each half is controlled by a separate strobe control register with 
corresponding memory width and data size fields. Another STRBO control reg- 
ister field, STRB CONFIG, is set to 1 to indicate that both STRBO and STRB1 
are mapped to the same set of four STRBO pins. The high memory address 
pin (in this case, A14) selects between the two halves of the memory. For this 
example, the C32 address pin A17 drives the memory pin A14. 


The state of the A17 bit of the physical address is derived from the logical ad- 
dress (logical as seen by the instruction). The state of the A17 bit also depends 
on the logical/physical address shift as determined by the size of the program/ 
data that is being accessed. In this case, the logical STRBO address range 
drives the physical address bit A17 to 0. Similarly, the logical STRB1 range 
drives the physical address bit A17 to 1 (after accounting for a 2-bit address 
shift due to the 8-bit width of the data). Additionally, the logical STRBO and 
STRB1 address ranges that drive the physical address pin A17 to 0 and 1, re- 
spectively, must still conform to the logical memory map that assigns fixed 
blocks of addresses to different strobe spaces. 


An STI RO,*ARO instruction (with ARO = 883FFFh) results ina STRBO data ac- 
cess (data size = 32 bits) driving the STRBO_BO, STRBO_B1, STRBO_B2, and 
STRBO_B3 control pins to write the contents of the 32-bit register RO into a 
32-bit data location in the lower half of the external memory addressed by 
3FFFh. Similarly, an LDI *AR1,R1 instruction (with AR1 = 98FFFFh) results in 
a STRB1 data access (data size = 8 bits) driving the STRBO_B3 control pin 
(because STRB CONFIG = 1) to read the contents of an 8-bit data location in 
the upper half of the external memory addressed by 7FFFh to the 32-bit R1 
register. The C32 automatically performs all address translation; the program- 
mer merely monitors the logical memory map and the two strobe control regis- 
ters. 
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Figure 4-26. One Bank/Two Strobes Address Translation: Data Size = 32 and 8 Bits 
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4.6.5.3. One Bank/Two Strobes Address Translation for Data Size = 16 and 32 Bits 


Figure 4—27 illustrates how a single physical block of memory can be split into 
two separate logical halves, one with 16-bit data and the other with 32-bit data. 
The access to each half is controlled by a separate strobe control register with 
corresponding memory width and data size fields. Another STRBO control reg- 
ister field, STRB CONFIG, is set to 1 to indicate that both STRBO and STRB1 
are mapped to the same set of four STRBO pins. The high memory address 
pin (in this case, A14) selects between the two halves of the memory. For this 
example, the C32 address pin A17 drives the memory pin A14. 


The state of the A17 bit of the physical address is derived from the logical ad- 
dress (logical as seen by the instruction). The state of the A17 bit also depends 
on the logical/physical address shift as determined by the size of the program/ 
data that is being accessed. In this case, the logical STRBO address range 
drives the physical address bit A17 to 0 (after accounting for a 1-bit address 
shift due to the 16-bit width of the data). Similarly, the logical STRB1 range 
drives the physical address bit A17 to 1. The logical STRBO and STRB1 ad- 
dress ranges that drive the physical address pin A17 to 0 and 1, respectively, 
must still conform to the logical memory map that assigns fixed blocks of ad- 
dresses to different strobe spaces. 


An STI RO,*ARO instruction (with ARO = 887FFFh) results ina STRBO data ac- 
cess (data size = 16 bits) driving the STRBO_B2 and STRBO_B3 control pins 
to write the contents of the 32-bit register RO into a 16-bit data location in the 
lower half of the external memory addressed by 3FFFh. Similarly, an LDI 
*AR1,R1 instruction (with AR1 = 923FFFh) results in a STRB1 data access 
(data size = 32 bits) driving the STRBO_BO, STRBO_B1, STRBO_B2, and 
STRBO_B3 control pins (because STRB CONFIG = 1) to read the contents of 
a 32-bit data location in the upper half of the external memory addressed by 
7FFFh to the 32-bit R1 register. The C32 automatically performs all address 
translation; the programmer merely monitors the logical memory map and the 
two strobe control registers. 
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Figure 4-27. One Bank/Two Strobes Address Translation: Data Size = 16 and 32 Bits 
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4.6.5.4 Example Summary 


The one bank/two strobes memory interface to the ‘C32 supports any com- 
bination of data size pairs (16/8, 32/8, and 16/32 bits) with no speed penalty. 
(The strobe control registers do not have to be reconfigured each time the data 
size changes.) Likewise, 16-bit external memory can be divided into two 
halves, each containing data of a different size (8, 16, or 32 bits). The same 
holds true for 8-bit external memory. All address translation information given 
in section 4.6.1 through section 4.6.4 also applies to the one bank/two strobes 
examples. 


To configure the external memory for one bank/two strobes access mode, use 
the following steps: 


1) Set the strobe configuration field in the STRBO control register to 1. 


2) Setthe memory width field in both the STRBO and STRB1 control registers 
to reflect the width of the physical memory. 


3) Set the data size field in both the STRBO and STRB1 control registers to 
reflect the size of the data portions chosen for each strobe. 


4) Choose one of the high physical address bits to split the physical memory 
into two halves. 


5) For the two memory halves, choose the STRBO and STRB1 logical ad- 
dress ranges to drive the chosen bit to 0 and 1, respectively. The chosen 
STRBO and STRB1 address ranges must fit inside the legal STRBO/ 
STRB1 address spaces, as defined by the memory map. 


4.6.6 RDY Signal Generation 


The ’C32 uses the RDY pin to determine whether the current bus cycle finishes 
at the end of the current clock cycle or requires additional clock cycles to com- 
plete. Even though the 'C32 can fetch instructions and access data in one 
clock cycle, a slow memory may need additional clock cycles (wait states) to 
complete the bus cycle. The RDY signal can be handled in one of three ways: 


Lj The RDY pincan be permanently grounded, indicating to the CPU that the 
external memory is always ready for the next cycle. This is used where all 
external memory is fast enough to preclude wait states. 


Lj The wait states can be programmed in software by setting bits in corre- 
sponding strobe control registers, if there is only one device per strobe. 
This method can be used even if there are external devices that require 
wait states. The RDY pin must be permanently grounded. 
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(1 The active generation of the RDY signal is required only if a single strobe 
controls two or more external memory banks or peripherals requiring dif- 
ferent numbers of wait states. 


The remainder of this section describes the active generation of the RDY sig- 
nal. The example involves three memory banks controlled by STRBO, each re- 
quiring a different number of wait states. This example directly applies to RDY 
signal generation involving STRB1 and is similar to the case of IOSTRB, which 
involves a more relaxed set of timing parameters. 


4.6.6.1 RDY Signal Timing Parameters for STRBO and STRB1 


4-58 


Figure 4—28 and Table 4—6 contain STRBO and STRB1 timing parameters that 
are typically used to generate the RDY signal. As evident in the read and write 
timing waveforms, the RDY signal generated by the external logic is clocked 
into the ’C32 on the falling edge of the H1 clock. The associated setup time is 
represented by parameter 17 and the hold time by parameter 18. Thus, for the 
60-MHz ’C32, the RDY signal must arrive at the RDY pin at least 17 ns before 
the falling edge of H1 and remain valid at least until H1 goes low. Timing pa- 
rameters 11 and 12 representthe STRBO and STRB1 low andhigh delays from 
the falling edge of H1. Timing parameter 14 represents the address valid delay 
from the falling edge of H1. For back-to-back write cycles, timing parameter 
22 represents the address valid delay from the rising edge of H1. Parameters 
11,12, 14, and 22 do not directly apply to RDY setup and hold, but are never- 
theless involved in the generation of the RDY signal. 
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Figure 4-28. RDY Signal Timing for STRBO and STRB1 Cycles 
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Table 4-6. RDY Signal Generation 


af YF VF VF 
' NEF NAF NF ON 
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| \ J | \ 


RDY \ / 


STRBO, STRB1, write cycle 


’c32-40t ’c32-50t ’c32-60t 
Barameler (50 ns) (40 ns) (33 ns) 
number Description Min) Max Min Max Min = Max 

11 td(HiL-SL) Delay time, H1 low to STRBx low 0 11 0 9 0 8 
12 td(H1L-SH) Delay time, H1 low to SRBx high 11 0 9 
14 td(H1L-A) Delay time, H1 low to A valid 11 0 9 
17 tsu(RDY) Setup time, RDY before H1 low 21 19 17 
18 th(RDY) Hold time, RDY after H1 low 0 0 0 
29 ta(HIH-A) Rada sak to A valid on back- 

ycles (write) 11 9 8 


Unit 
ns 
ns 
ns 
ns 


ns 


ns 


t These timing specifications are subject to change without notice. See the TMS320C32 Digital Signal Processor data sheet 


for current timing informat 


ion. 
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4.6.6.2 RDY Signal Generation for STRBO Signals 
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Figure 4—29 shows three memory banks controlled by a single strobe 
(STRBO). The first bank is composed of four 8-bit-wide SRAMs requiring zero 
wait states to operate at 60 MHz (15-ns devices). Bank 2 is composed of two 
1-wait-state SRAMs, and bank 3 contains one 3-wait-state EPROM (which is 
8 bits wide). The RDY pin is normally high, indicating a not-ready state. It goes 
low if either RDY_BANK1 or RDY_BANK23 goes low. 


The RDY_BANK1 signal is asserted only if two conditions are satisfied: 


Lj Atleast one of the four STRBO signal lines must be active. 
1 The three address decode bits must match the bank 1 space. 


Since no wait states are involved, the RDY_BANK1 signal does not have to 
be synchronized with the H1/H3 clocks, and, therefore, it can directly drive the 
RDY pin after being gated with its bank 2/bank 3 counterpart. 


The STRBO_BANK23 signal becomes active (high) if the three address de- 
code bits match bank 2 or bank 3 address spaces while STRBO_BO and/or 
STRBO_B1 are active (low). The STRBO_BANK23 signal, when high, sets a 
high data state in a synchronous progression through a chain of four registers. 
Depending on which point in the chain is tapped, a RDY signal delay ranging 
from zero to three wait states can be achieved. In this case, both 1-wait-state 
and 3-wait-state taps assert the RDY_B23YES signal to reflect bank 2 or bank 
3 access. Finally, a 2-register circuit removes the trailing active low edge of the 
RDY_B23YES signal by ORing it with RDY_23NOT (see Figure 4-30). The 
resulting RDY_BANK23 is ANDed with its bank 1 counterpart to drive the RDY 


pin. 
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Figure 4-30 contains timing waveforms for RDY signal generation. It illus- 
trates how the RDY signal is generated for a series of external back-to-back 
memory read cycles in which the first cycle accesses bank 1 (zero wait states), 
the second cycle accesses bank 2 (one wait state), the third cycle accesses 
bank 3 (three wait states), and the fourth and fifth cycles access bank 1 (zero 
wait states). For each read cycle, the RDY waveform is marked with a resulting 
setup time. For the 60-MHz device, the RDY signal must become valid at least 
17 ns before every falling edge of the H1 clock. 


In the 0-wait-state cycle, the address and strobe signals become valid 8 ns 
from the falling edge of H1. An additional 5 ns are needed for a single pass 
through a fast combinational logic device for a total setup time of the resulting 
RDY signal equal to 20 ns. This leaves 3 ns for board delays and a modest 
safety factor. 


For the 1- and 3-wait-state cycles, the bank decode and strobe signals do not 
directly drive the RDY signal. They are instead combined into the 
STRBO_BANK23 signal that, when active, releases the clear condition on the 
3-register delay chain driven by the H3 clock. The register chain is then free 
to propagate a high state at the rate of one register per clock cycle. The two 
taps in the register chain (at the first and third registers, representing one wait 
state and three wait states, respectively) are ORed with their corresponding 
bank select signals to result in the RDY_B23YES signal synchronous to H1/H3 
clocks. The RDY_B23YES leading-edge 10-ns delay is caused by two passes 
through a fast PAL® device (such as a 22V10). The trailing edge of this signal 
is caused by bank 2 or bank 3 decode circuits going inactive after the RDY sig- 
nal is recognized by the processor. The address decode (8 ns) plus two passes 
through the PAL (5 + 5 ns) combine for a total delay of 18 ns that can cut into 
the next cycle’s RDY setup requirement (33 — 18 = 15 ns) if not modified. To 
deactivate the RDY signal sooner, a single-register circuit is added to generate 
the RDY_B23NOT, which, when ORed with the RDY_B23YES, yields the 
RDY_BANK23 signal that satisfies the RDY setup time for the next cycle. Fi- 
nally, RDY_BANK1 and RDY_BANK23 are ANDed together to produce the fi- 
nal RDY signal that is wired to the processor’s RDY pin. 
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Figure 4-30. RDY Signal Generation Timing Waveforms 
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4.6.7 Address Decode for Multiple Banks 
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Figure 4—31 illustrates the logical-to-physical address translation for the three 
memory banks used in the RDY signal generation example in section 4.6.6. 
Each memory bank is a different physical width, as shown by the physical ad- 
dress column on the right side of the figure. The left side of the figure repre- 
sents the internal (logical) address ranges for each of the three memory banks. 
Logical-to-physical address translation is controlled by strobe control registers 
and by their data size and memory width fields. The middle column of 
Figure 4—31 shows the logical address field (top row) over the physical ad- 
dress (bottom row) for each address translation case. The active address 
fields are shaded gray, and the inactive address bits are white. The black fields 
are special address bits that can selectively control multiple strobe lines or 
choose between individual portions of a data word that is larger than the physi- 
cal memory it is accessing. 


For example, in bank 2, the right side of the figure indicates that the physical 
memory width for this bank is 16 bits. The left side indicates that, regardless 
of the physical memory width, 32-, 16-, and 8-bit data can be moved by pro- 
gramming the STRBO control register. The low-order (shaded) bits of logical/ 
physical address rows show how many bits are actually used for addresses 
so that the correct high-order address bits can be assigned to bank decode. 
Physical address bits A17 and A18 are chosen for bank decode because they 
lie outside the used address bits. A17 and A18 decode between banks 1, 2, 
and 3, with A18—A17 = (0,1) assigned to bank 1, (1,0) assigned to bank 2, and 
(1,1) assigned to bank 3. Address bit A23 is set to 0 to isolate the STRBO aa- 
dress space from the STRB1 and IOSTRB memory maps. 


The dotted lines bounding the bank decode bits allow you to see that the exter- 
nal address bits, A18—A17, line up perfectly, but their logical address counter- 
parts do not. The amount of reverse shift between the logical and physical ad- 
dresses depends on the size of the data being accessed and the width of the 
physical memory. Each of the three address translation cases for each of the 
three banks translates physical address bits A18—A17 into two contiguous 
logical address bits that can lie anywhere between A20 and A17. Once the log- 
ical images of the external bank decode bits are identified along with low-order 
address bits and the A23 strobe decode bit, they define the final logical 
memory map for the three STRBO banks together. 
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Figure 4—31. Address Decode for Multiple Memory Banks 
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Note: Active address fields are shaded gray; inactive address bits are white. The black fields are special address bits that con- 
trol multiple strobe lines or choose between portions of a data word that is larger than the physical memory itis accessing. 
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Each memory bank actually has three logical memory maps, depending on the 
size of the data being accessed and the setting of the corresponding bits in the 
STRBO control register. 


The address ranges in these logical memory maps are all different, yet all three 
maps translate perfectly into a single physical address map that identifies the 
bank. In using the three logical memory maps, the programmer must exercise 
caution to prevent overwriting 8-bit data with 16-bit data (or 16-bit data with 
32-bit data) that may have a different logical address but still occupy the same 
place in physical memory. To be certain that the logical address maps 
associated with 8-, 16-, and 32-bit data sizes do not overlap within a single 
physical memory bank, the three logical maps must be further divided into 
mutually exclusive areas before they are used by the programmer. Further- 
more, when a program jumps from one physical memory bank to another of 
a different width, the memory width configuration bits in the appropriate strobe 
register must be changed. 
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4.7 How TMS320 Tools Interact With the TMS320C32’s Enhanced Memory 
Interface 


The ’C32’s memory interface accesses external memory through one 24-bit 
address bus and one 32-bit data bus. The data bus is shared by three mutually- 
exclusive strobes: STRBO, STRB1, and IOSTRB. Depending upon the ad- 
dress accessed, the 'C32 activates one of these strobes. (See the 
TMS320C3x User’s Guide for more information about memory maps.) 


STRBO and STRB1 can access 8-, 16-, or 32-bit data quantities from 8-, 16-, 
or 32-bit-wide memory. Access is achieved by four signals within each strobe. 
These signals are: 


Q STRBx_Ba/A_, 
 STRBx_B2/A_» 
Q STRBx_Bi 
4 STRBx_BO 


The listed signals serve as byte-enable pins for accessing a byte, half-word, 
or full-word from external memory. The first two signals also serve as addition- 
al address pins when performing two or four consecutive accesses in 8- or 
16-bit-wide external memory. The data accessed is truncated, packed, or un- 
packed accordingly, with no additional overhead. The following list shows the 
behavior of these pins, as dictated by the data size and memory-width bit 
fields. 


The default value of a strobe control register depends on the program memory 
width select (PRGW) pin level. 


Lj] 8-bit-wide memory 
mM STRBx_B3/A_; and STRBx_B2/A_o are address pins. 
m STRBx_B0 is a byte-enable/chip-select signal. 
m@ STRBx_B1 is not used. 


Lj) 16-bit-wide memory 
m STRBx_B3/A_; are address pins. 
m STRBx_B1 and STRBx_BO are byte-enable signals. 
m STRBx_B2/A_o are not used. 


Lj 32-bit-wide memory 
mM STRBx_B3/A_1, STRBx_B2/A_o, STRBx_B1, and STRBx_BO are 
byte-enable signals. 
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(j] Data size: 
M 8-bit data: The physical address is the logical address shifted right by 
2. 
Mm 16-bit data: The physical address is the logical address shifted right 
by 1. 
M 32-bit data: The physical address is the logical address. 


IOSTRB can access 32-bit data from 32-bit-wide memory. However, IOSTRB 
does not have the flexibility of STRBO and STRB1 because it is composed of 
a single signal. IOSTRB bus cycles differ from STRBO and STRB1 bus cycles. 
(See the /nterlocked Operations section in the Program Flow Control chapter 
of the TMS320C3x User's Guide for more information.) This timing difference 
accommodates slower I/O peripherals. 


The ’C32 also supports program execution from 16- and 32-bit external 
memory widths. Execution is controlled through the status of the PRGW pin. 
When this pin is pulled high, the ‘C32 executes from 16-bit-wide memory. 
When the PRGW pin is pulled low, the C32 executes from 32-bit-wide 
memory. For 16-bit-wide zero-wait-state memory, the ‘C32 takes two instruc- 
tion cycles to fetch a single 32-bit instruction. The lower 16 bits of the instruc- 
tion are obtained during the first cycle; the upper 16 bits are retrieved and con- 
catenated with the lower 16 bits during the second cycle. The ’C32’s 32-bit 
memory fetches are identical to those of the ‘C30 and ’C31. 


In summary, the C32 memory interface parallel bus implements three mutual- 
ly exclusive address spaces that are distinguished through the use of three 
separate control signals (see Figure 4-32). STRBO and STRB1 support 8-, 
16-, and 32-bit data access in 8-, 16-, and 32-bit-wide external memory and 
32-bit program access in 16/32-bit-wide external memory. IOSTRB address 
space supports 32-bit data/program access in 32-bit-wide external memory. 
Internally, the C32 has a 32-bit architecture; accordingly, the memory inter- 
face packs and unpacks the data accessed. Three strobe control registers ma- 
nipulate the variable-width memory interface of the ‘C32. (See the 
TMS320C3x User’s Guide for a detailed description of the ’C32 memory inter- 
face.) 
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Figure 4-32. TMS320C32 Memory Address Spaces 
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C Compiler Interaction With the TMS320C32 Memory Interface 


The ’C32’s internal 32-bit architecture allows the C compiler’s data types to re- 
main 32 bits wide. However, the C compiler’s runtime-support library includes 
pragma directives and new dynamic-allocation routines (malloc, realloc, cal- 
loc, bmalloc, free, etc.) that support the creation of data sections. These data 
sections serve as memory pools for storing 8- and 16-bit data. These sections 
can reside in 8-, 16-, and 32-bit-wide memory. The programmer must ensure 
that the appropriate strobe control register is loaded with the correct data size 
and memory width. The ’C32’s memory interface truncates, packs, or unpacks 
the data in the manner specified by the settings of the strobe control register. 
Table 4—7 lists the data sizes supported by the sections created by the C com- 
piler. 


Table 4—7. Data Sizes Supported by Sections Created by the C Compiler 


Section Type 


Initialized 


Uninitialized 


32 Bits 16 Bits 8 Bits 

.text .user_section .user_section 
.cinit 

.const 


.user_section 


.bss sysm16 -sysm8 
stack .user_section .user_section 
sysmem 


.user_section 
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The contents of the named sections are as follows: 

_j .text: executable code and/or string literals 

.cinit: tables for variable and constant initialization 
const: string literals and switch tables 

-bss: global variables and statically allocated variables 


-Stack: system stack used to pass function arguments and to allocate local 
function variables 


-sysmem: memory pool for dynamic allocation of 32-bit data 
-sysm16: memory pool for dynamic allocation of 16-bit data 


-sysm8: memory pool for dynamic allocation of 8-bit data 


-user_section: section created using the #pragma DATA_SECTION di- 
rective 


The following sections describe the C compiler’s preprocessor pragma and 
modules in the runtime-support library that support 8- and 16-bit memory 
pools. The 32-bit memory pools are handled through the standard minit(), mal- 
loc(), smalloc(), calloc(), realloc(), and free() routines, which operate on the 
.sysmem section. 


4.7.1.1 DATA_SECTION Pragma Directive 
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To support additional memory pools, the C compiler uses a data section prag- 
ma directive. This directive instructs the C compiler to allocate space for sym- 
bol_name in the section specified by section_name of size symbol_size. (See 
the TMS320 Floating-Point DSP Optimizing C Compiler User’s Guide for addi- 
tional information.) The syntax for DATA_SECTION is as follows: 


#pragma DATA_SECTION(symbol_name, “section_name”) 


type symbol_name; 


For example, define a new section called .mydata as an array of 1K integer 
values in the following manner: 


#pragma DATA_SECTION(dataBuf, “.mydata”) 
int dataBuf [1024]; 
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4.7.1.2 MEMORY8.C Module 


The MEMORY8.C module contains functions that implement dynamic 
memory management routines for using 8-bit data with the ‘C32. (See the 
TMS320C3x/C4x Optimizing C Compiler User’s Guide for more information on 
8-bit runtime-support functions.) 


The pragma directive in the MEMORY8.C module defines a .sysm8 section. 
The size of this memory pool in words (system memory or heap) is set at link 
time by using the -heap8 option. If the -heap8 option is not used, the compiler 
does not allocate an 8-bit system memory area. If arguments are not used in 
conjunction with this switch, the size of the 8-bit system memory area defaults 
to 1K 8-bit words. The following functions operate in the 8-bit .sysm8 section: 


LL] minit8(): initializes and resets the 8-bit dynamic memory management 
system 


[1 malloc8(): allocates 8-bit words from the 8-bit memory pool and returns 
a pointer to the allocated space 


Lj calloc8(): allocates 8-bit words from the 8-bit memory pool, clears allo- 
cated memory locations, and returns a pointer to the allocated space 


Li realloc8(): reallocates 8-bit words from previously unallocated areas in 
the 8-bit memory pool; a pointer to the allocated space is returned 


Lj free8(): frees previously allocated space from the 8-bit memory pool 


Lj bmalloc8(): allocates 8-bit words from the 8-bit memory pool. The allo- 
cated words are aligned to a boundary that is suitable for the ‘C32’s circu- 
lar and bit-reversed buffers; a pointer to the allocated space is returned. 


Lj _SYSMEM8 SIZE: an external label that contains the size, in words, of 
the 8-bit system memory pool 


4.7.1.3 MEMORY16.C Module 


The MEMORY16.C module contains functions that implement dynamic 
memory management routines for the ’C32’s 16-bit data. (See the 
TMS320C3x/C4x Optimizing C Compiler User’s Guide for more information on 
16-bit runtime-support functions.) 


The pragma directive in the MEMORY16.C module defines a .sysm16 section. 
The size of this memory pool in words (system memory or heap) is set at link 
time by using the -heap16 option. If the -heap16 option is not used, the 
compiler does not allocate a 16-bit system memory area. If arguments are not 
used in conjunction with this switch, the size of the 16-bit system memory area 
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defaults to 1K 16-bit words. The following functions operate in the 16-bit 
sysm16 section. 


[1 minit16(): initializes and resets the 16-bit dynamic memory management 
system 


Li malloc16(): allocates 16-bit words from the 16-bit memory pool and re- 
turns a pointer to the allocated space 


L1 calloc16(): allocates 16-bit words from the 16-bit memory pool, clears al- 
located memory locations, and returns a pointer to the allocated space 


Lj realloc16(): reallocates 16-bit words from previously unallocated areas 
in the 16-bit memory pool; a pointer to the allocated space is also returned 


Lj free16(): frees previously allocated space from the 16-bit memory pool 


Lj bmalloc16(): allocates 16-bit words from the 16-bit memory pool. The al- 
located words are aligned to a boundary that is suitable for the ’C32’s cir- 
cular- and bit-reversed buffers; a pointer to the allocated space is also re- 
turned. 


Lj  _SYSMEM16_ SIZE: an external label that contains the size, in words, 
of the 16-bit system memory pool 


4.7.1.4 Memory Pool Limitations 


The ’C32 has only three strobes: STRBO, STRB1, and IOSTRB. This means 
a programmer cannot have more than three memory pools; one memory pool 
assigned to each strobe. IOSTRB can hold only 32-bit data and can only ac- 
commodate the 32-bit memory pool .sysmem. Conversely, STRBO and 
STRB1 can hold 8-, 16-, and 32-bit data and can accommodate the 8-, 16-, and 
32-bit memory pools .sysm8, .sysm16, and .sysmem. 


All pointers and constants must be stored in memory configured to hold 32-bit 
data. Hence, the .bss, .stack, .cinit, and .const sections must reside in memory 
with data size configured to 32 bits. 


4.7.2 C Compiler and Assembler Switch 
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To create code for the C32, the assembler and C compiler use the -v32 version 
specification switch. The following example demonstrates the use of this 
switch with the assembler and C compiler, respectively: 

asm30 -v32 myfile.asm 


cl130 -v32 myfile.c 
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4.7.3 Linker Switches 


To support the ’C32’s 8- and 16-bit memory pools, the linker uses the following 
switches: -heap8, -heap16, and -heap. These switches set the size, in words, 
of the respective 8-, 16-, and 32-bit memory system areas .sysm8, .sysm16, 
and .sysmem. The user must link these sections into the appropriate address- 
es, thereby activating strobes that are configured to access 8-, 16-, or 32-bit 
data. 


The following example demonstrates the link-time sizing of an 8-bit memory 
pool to 256K words: 


1nk30 -heap8 0x4000 


The linker creates these memory system areas using an input file that contains 
the .sysmem, .sysm8, and .sysm16 data-section definitions. If the input file 
does not exist, the linker is unable to perform memory area processing. 


The linker also creates the global symbols _SYSMEM_SIZE, _SYS- 
MEM8_SIZE, and_SYSMEM16_SIZE and subsequently assigns each a val- 
ue equal to the respective -heap, -heap8, and -heap16 size. The default size 
for each memory system area is 1K words (word size depends on system 
memory width). 


4.7.4 Debugger Configuration 


For the debugger to properly disassemble and read/write external memory, 
the user must configure the strobe control registers before loading and execut- 
ing code. Because the ‘C32 supports code execution from 16- or 32-bit 
memory, the debugger may need to temporarily set the strobe control register 
to a 32-bit data size in order to write an instruction (either by loading code or 
patching code) or to read an instruction with the objective of disassembling a 
range of program memory. 


To support code execution from 16- and 32-bit memory, the memory map add 
(ma) command includes a new type parameter that directs the debugger to 
treat .text sections as 32-bit data. While reading or writing .text sections, the 
debugger does the following: 


1 Temporarily stores the configuration of the appropriate strobe control 
register 


1 Temporarily sets the data size to 32 bits 


uu 


Reads or writes the targeted portion of the .text section 


_j Restores the strobe control register to its previous value 
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The syntax for the memory map add command is: 


ma address, length, type 


where: 


address defines the starting address of a range of memory 


length defines the length of the memory range 


type identifies the read/write characteristic of the memory range de- 
pending upon one or more of the following keywords: 


a 


L] 
L) 
Lj 
L) 


R: read only 

W: write only 

WR or RAM: read/write 

PROTECT: no-access memory 

TX: memory that stores .fext (code) section 


4.7.5 TMS320C32 Configuration Examples 


Ths section describes the possible C32 memory interface configurations, in- 
cluding instructions on how to allocate buffers, build link files, and configure 
the debugger for each memory configuration. 


4.7.5.1 Two External Memory Banks 
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The ’C32’s external memory interface allows the use of two zero-wait-state ex- 
ternal memory banks with different widths without requiring additional logic or 
incurring access penalty costs. These external memory banks provide flexibil- 
ity in balancing performance and system cost (performance and system cost 
increase with wider memory chips). For example, the programmer can 
execute code from 32-bit wide memory while storing data in 8-bit memory (see 
Figure 4-33). This approach is advantageous for applications with large 
amounts of 8-bit data that require execution at the fastest speed of the device. 
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Figure 4-33. Zero-Wait-State Interface for 32-Bit and 8-Bit SRAM Banks 


TMS320032 
tM 32-bit-wide memory banks ————___-> 
8-bit-wide 
Ai4 Pm Aria Pm Aris -— A14 m Ar14 memory bank 
A13 PAi3 A133 A 
Ai2 P- A12 Pm A12 ——> Ai2 Pm A12 [> A14 
Att Pm Ait me Ati —P> Att mA t—— > A13 
Ai > Ay Pm AY > At Pm AY > Ag 
Ao em Ao > Ao | Ao > Ao -—— Ao 
_ = —_ en oe r——> Aji 
RW |» We >|WeE —»{WE >| WE >} A0 
pcs p{CS -»>cs —»1cs _ 
Awe 
STRB1_B3 /O(7-0) /0(7-0) /0(7-0) /0(7-0) CS 
STRBO_B2 A A A A 
e p | /0(7-0) 
STRBO_B1 
STRBO_BO 
D(31—24) 
D(23-16) 
D(15-8) lq 
D(7-0) lg 
STRB1_B3/A.4 | 
STRB1_B2/A.9 | 
STRB1_BO |} 


In Figure 4—33, a bank of 32K x 32 bits is mapped to STRBO, and a bank of 
32K x8 bits is mapped to STRB1. For this configuration, the programmer must 
set the following: 


1 STRBO control register physical memory width to 32 bits and the data type 
size to 32 bits 


11 STRB config bit field to 0, that is, STRBO control register = OOOFOOO0Oh 
(banks are separate) 


11 STRB1 control register physical memory width to 8 bits and the data type 
size to 8 bits, that is, STRB1 control register = Q0000000h 


Additionally, the PRGW pin must be pulled low to indicate 32-bit program 
memory width. 
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Figure 4—33 also maps the 32-bit-wide bank’s external memory address pins, 
Ay4A43.--A1Ag,tothe’C32’s Ay 4A43A42...A;Agpins. Conversely, the8-bit-wide 
bank’s memory address pins, Aj4A43...A4Ap, are mapped to the 'C32’s 
Aj2...A,AoA.4 pins. Because STRB1 is configured for 8-bit memory width, the 
external address presented on C32 pins is shifted right by two bits. As a result 
of this mapping, external memory accesses in the range Oh through 7FFFh 
read or write 32-bit data to the 32-bit-wide bank (STRBO). Memory accesses 
in the range 900000h through 907FFFh read or write 8-bit data to the 8-bit- 
wide bank (STRB1). 


Two banks of different memory widths must not be connected to the same 
STRB without external decode logic. Different memory widths require 
STRBx_Bx signals to be configured as address pins. These address pins are 
active for any external memory access, that is, STRBO, STRB1, IOSTRB, and 
program fetches. 


8-bit Dynamic Memory Allocation 


This section contains C code examples of 8-bit dynamic buffer allocation, link- 
er configuration, and a debugger batch file. 


Example 4—1 demonstrates the allocation of two buffers (1K and 4K 8-bit 
words) using the 8-bit dynamic memory allocation routines. 


Example 4-1. 8-Bit Dynamic Buffer Allocation 


void main() 


{ 


int *bufferl; 

float *buffer2; /* Configure the STRBO control register for 32-bit wide 
memory, 32-bit data_size. */ 

*0x808064 = OxF0000; /* Configure the STRB1 control register for 8-bit wide 
memory, 8-bit data size. */ 

*0x808068 = 0x00000; /* Allocate 1K 8-bit words in the 8-bit memory pool. */ 

bufferl = malloc8(1024 * sizeof(int) ); /* Allocate 4K 8-bit floats in the 8-bit 

memory pool. */ 
buffer2 = malloc8(4096 * sizeof(float) ); /* Process buffers. */ 


callDSPoperation(bufferl, buffer2); 
/* Free buffers. */ 

free8 (buffer2) ; 

free8 (bufferl); 


0 | 
Note: 


The TMS320 floating-point C compiler sizeoffunction returns 1 for both inte- 
ger and float data types. 


a) 


4-76 


How TMS320 Tools Interact With the TMS320C32’s Enhanced Memory Interface 


Example 4—2 allocates sections of the preceding code into the desired 
memory configuration. 


Example 4-2. Linker Command File 


sample.obj /* Input filename */ 
-heap8 32768 /* Set 8-bit memory pool size. */ 
-stack 8704 /* Set C system stack size. * / 
-o sample.out /* Specify output file. * / 
—-m sample.map /* Specify map file. */ 
MEMORY 
{ 
PRGRAM : org = 0x0000, len = 0x2000 
STRBORAM : org = 0x2000, len = 0x6000 
ONCHIRAM : org = 0x87Fe00, len = 0x200 
STRB1RAM : org = 0x900000, len = 0x8000 
} 
SECTIONS 
{ 
.text > PRGRAM /* 32-bit data section * / 
.cinit > STRBORAM /* 32-bit data section * / 
.const > STRBORAM /* 32-bit data section */ 
.bss > STRBORAM /* 32-bit data section */ 
.stack > STRBORAM /* 32-bit data section * / 
-sysm8 > STRB1RAM /* 8-bit memory pool mapped to 
STRB1 */ 
} 


The debugger batch file shown in Example 4-3 executes initialization com- 
mands that configure the C source debugger to handle a’C32 with the memory 
configuration shown in Figure 4—33 on page 4-75. 
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Example 4—3. Debugger Batch File 


mr 
sconfig init.clr 


ma 0x0000, O0x2000, 


ma 0x2000, 0x6000, 
ma Ox87FE00, 0x200, 
ma 0x808000, Ox10, 
ma 0x808020, 0x20, 
ma 0x808040, Ox10, 
ma 0x808060, 0x10, 


; 
load sample.out 


R|W|TX ; 
RAM 7 

RAM ; 
RAM ; 
RAM H 
RAM , 
RAM 7 


ma 0x900000, 0Ox8000, RAM H 


reset 

map on 7 
?*0x808064 = OxF0O000 ; 
?*0x808068 = 0x00000 i 


; Define memory configuration. 


Inform debugger that this section holds code 
(.text). 
No code here, STRBO 


On-chip 

Peripheral Bus Control - DMA 

Peripheral Bus Control - Timers 

Peripheral Bus Control - Serial Port 0 

Peripheral Bus Control - External Memory Interface 


STRB1 


Make emulator aware of this memory configuration. 


Set STRBO control register to 32-bit memory width, 
32-bit data size. 

Set STRB1 control register to 8-bit memory width, 
8-bit data size. 


Configure STRBO and STRB1 control registers befor 
loading code. 


8-Bit Static Memory Allocation 
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This section provides examples of 8-bit static buffer allocation and associated 
linker configuration. The debugger batch file is identical to the batch file in 


Example 4—3 and, therefore, is not shown. 


The C code in Example 4—4 demonstrates the static allocation of two buffers 
(1K and 4K 8-bit words) by defining a user section called .mydata8. This sec- 


tion is used to hold a structure consisting of two arrays of data values. 
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Example 4-4. 8-Bit Static Buffer Allocation 


#pragma DATA_SECTION(buffer8, “.mydata8”) 
struct bufferStruct { 
in[1024]; 
out [4096]; 
} buffer8; 
void main() 


{ 


/* Configure the STRBO control register for 32-bit wide memory, 32-bit 
data size. */ 
*0x808064 = OxF0000; 
/* Configure the STRB1 control register to 8-bit wide memory, 8-bit data 
size. */ 

*0x808068 = 0x00000; 

/* Process buffers. */ 


callDSPoperation (buffer8.in, buffer8.out); 


The linker command file in Example 4—5 allocates sections of the above C 
code into the desired memory configuration. 


Example 4-5. Linker Command File 


sample.obj /* Input filename x) 
-stack 8704 /* Set C system stack size. aif 
-o sample.out /* Specify output file. * / 
-m sample.map /* Specify map file. */ 
MEMORY 
{ 
PRGRAM : org = 0x0000, len = 0x2000 
STRBORAM org = 0x2000, len = 0x6000 
ONCHIRAM 7 org = 0x87Fe00, len = 0x200 
STRB1RAM : org = 0x900000, len = 0x8000 
} 
SECTIONS 
{ 
.text > PRGRAM /* 32-bit data section */ 
.cinit > STRBORAM /* 32-bit data section * / 
.const > STRBORAM /* 32-bit data section */ 
.bss > STRBORAM /* 32-bit data section */ 
.stack > STRBORAM /* 32-bit data section */ 
-mydata8 > STRB1RAM /* 8-bit memory pool mapped to STRB1 */ 
} 
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4.7.5.2 Single External Memory Bank 
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Consider the case of a typical audio compression application written in C that 
requires 32-bit data for the system stack and 16-bit data for the audio buffers. 
In this case, the programmer can interface the C32, as shown in Figure 4-34. 
This example assumes 32K 32-bit words of external memory. This memory is 
further defined as containing 8.5K 32-bit words of stack and 8K 32-bit words 
of program space; both areas are mapped to STRBO (program space includes 
constants and global/static variables). Also, external memory contains 32K 
16-bit word data buffers that are mapped into STRB1. 


Due to this mapping, the programmer must set the following: 


Li STRBOcontrol register physical memory width to 32 bits and the data type 
size to 32 bits 


J STRB configuration bit field to 1 (STRBO control register = O0O2FO000h) 


11 STRBi1 control register physical memory width to 32 bits and the data type 
size to 16 bits, that is, STRB1 control register = O0OOD0000h 


Additionally, the PRGW pin must be pulled low to indicate 32-bit program 
memory width. 
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Figure 4-34. Zero-Wait-State Interface for 32-Bit SRAMs with 16- and 32-Bit Data 


Accesses 
TMS320C32 < 32-bit-wide memory banks > 
A22 Pm Ai4 Pm A14 -—P A14 Pm Aig 
A13 Pm A13 Pm Ai3 > Ar3 Pm A13 
Ai2 em Ai2 Pm Ai2 -——P Ata Pm Ai2 
Ait Pm A Pm Att re OA Pm An 
Ay PAY PAY —— Ay ane 
RW p> WE > WE >) WE > WE 
> CS > Cs > CS > Cs 
STRBO_B3 1/0(7—0) /O(7-0) /0(7-0) 1/0(7-0) 
STRBO_B2 A A A A 
STRBO_B1 
STRBO_BO 
D(31-24) ~~ 
D(23-16) ~¢ 
D(15-8) ~¢ 
D(7—-0) 


The external memory address pins Aj 4A73...A;Ap are mapped to the ’C32’s 
AooA413A42...A4Ag pins. This mapping was selected to position the system 
stack immediately after the ’C32’s internal RAM. Performance is improved be- 
cause the top of the stack resides in internal RAM, and the stack is allowed to 
grow into external RAM. With this mapping, external memory accesses in the 
range 4000h through 7FFFh read or write 16-bit data; memory accesses in the 
range Oh through 3FFFh read or write 32-bit data. The PRGW pin controls the 
program fetches. 


Figure 4—35 shows the contents of external memory. Because of the address 
shift of the 'C32’s external memory interface, the memory map for the ’C32 
CPU is slightly different (see Figure 4-36). 
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Figure 4—35. External Memory Map 
Physical 


address Contents 
Oh 
System stack area 

{FEFh (8K x 32 bits) 
2000h Program word 0 

Program word 1 
3FFFh Program word 8191 
4000h Datat Data 
4001h Data3 Data2 
7FFFh Data32767 Data32766 


Note: For 32-bit data, physical address = logical address. 
For 16-bit data, physical address = logical address shifted left by 1. 
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Figure 4-36. TMS320C32 Memory Map 


Note: 


Logical 
address 


Oh 


2000h 


3FFFh 
4000h 


87FE00h 


87FFFFh 
880000h 


881FFFh 


900000h 


907FFFh 


FFFFFFh 


Contents 


Program 
(8K x 32 bits) 


Internal RAM 
(512 x 32 bits) 


System stack 
(8K x 32 bits) 


Data buffers 
(32K x 16 bits) 


For 32-bit data, physical address = logical address. 
For 16-bit data, physical address = logical address shifted left by 1. 
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16-Bit Dynamic Memory Allocation 


This section contains C code examples of 16-bit dynamic buffer allocation, 
linker configuration, and a debugger batch file. 


The following C code demonstrates the allocation of two buffers (1K and 4k, 
16-bit words) using the 16-bit dynamic memory allocation routines provided 
by the runtime-support library. 


Example 4-6. 16-Bit Dynamic Buffer Allocation 


# include <bus30.h> 
void main() 
{ 
int *bufferl; 
float *buffer2; 
/* Configure the STRBO control register to STRBO and STRB1 overlay. */ 
/* 32-bit wide memory, 32-bit data size */ 
/* If using the PRTS30 headers, 
BUS_ADDR->STRBO_gcontrol = STRBO_1_CNFG | MEMW_32 | DATA_32; */ 
*0x808064 = 0x2F0000; 
/* Configure STRB1 control register to 32-bit wide memory, 16-bit data 
size. */ 
/* If using the PRTS30 headers, 
BUS_ADDR->STRB1_gcontrol = MEMW_32 | DATA_16; */ 
*0x808068 = O0xD0O000; 
/* Allocate 1K 16-bit words in the 16-bit memory pool. */ 
bufferl = mallocl6(1024 * sizeof(int) ); 
/* Allocate 4K 16-bit floats in the 16-bit memory pool. */ 
buffer2 = mallocl6(4096 * sizeof(float)); 
/* Process buffers. */ 
callDSPoperation(bufferl, buffer2); 
/* Free buffers. */ 
freel6 (buffer2); 
freel6 (bufferl); 


The linker command file in Example 4—7 allocates sections of the preceding 
C code into the memory configuration depicted in Figure 4-35 on page 4-82. 
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Example 4-7. Linker Command File 


sample.obj /* Input filename +7 
-heapl6 32768 /* Set 16-bit memory pool size. */ 
-stack 8704 /* Set C system stack size. */ 
-o sample.out /* Specify output file. */ 
-m sample.map /* Specify map file. */ 
MEMORY 
{ 
STRBORAM : org = 0x2000, len = 0x2000 
STACKRAM . org = 0x87Fe00, len = 0x2200 
STRB1RAM 7 org = 0x900000, len = 0x8000 
} 
SECTIONS 
{ 
.text > STRBORAM /* 32-bit data section */ 
.cinit > STRBORAM /* 32-bit data section */ 
.const > STRBORAM /* 32-bit data section */ 
.bss > STRBORAM /* 32-bit data section */ 
.stack > STACKRAM /* 32-bit data section */ 
-sysml6 > STRB1RAM /* 16-bit memory pool mapped to STRB1 */ 


The debugger batch file in Example 4—8 executes initialization commands that 
configure the C source debugger to handle a’C32 with the memory configura- 
tion shown in Figure 4—36 on page 4-83. 


Example 4—8. Debugger Batch File 


mr 
sconfig init.clr 

; Define memory configuration. 

ma 0x2000, 0x2000, R|w|Tx ; Inform debugger that this section holds code 
(.text). 


ma Ox87FE00, 0x2000, RAM 
ma 0x900000, 0x8000, RAM 


map on ; Make_emulator aware of this memory configuration. 
?*0x808064 = O0Ox2F0000 ; Set STRBO control register to STRBO and STRB1 


; overlay. 
; 32-bit memory width, 32-bit data size 


?*0x808068 = O0xDO000 , Set STRB1 control register. 
; 32-bit memory width, 16-bit data size 


load sample.out ; Configure STRBO/STRB1 control registers before 
loading code. 
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Booting a TMS320C32 Target System in a C Environment 


4.8 Booting a TMS320C32 Target System in a C Environment 


4.8.1 
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A DSP system uses a boot procedure following power-up or reset to initialize 
the system volatile memory (Such as SRAM) with the application program/data 
and to start execution of the application code. The SRAM loads from a nonvol- 
atile medium (EPROM) or from a PC development platform using a debugger/ 
loader program. The loader uses an emulator cable to move the load file from 
the PC hard disk to the SRAM on the DSP target board. An EPROM boot 
causes the DSP to start program execution directly from 16- or 32-bit EPROM 
(microprocessor mode). A hard-wired on-chip boot loader program copies the 
boot table from the 8-bit EPROM to internal or external SRAM and then starts 
execution from the SRAM (microcomputer/boot loader mode). 


Tl supports four ways to boot a DSP system following power-up/reset. Each 
boot procedure uses a different combination of C32 silicon features, software, 
and hardware tools. Each combination forms an integrated development envi- 
ronment that includes features to support most system boot requirements. 


A boot development flow includes two major tasks: 


1) Use C source debugger and assembly level tools to compile, assemble 
and link the boot code/data to create a binary common object file format 
(COFF) executable object. 


2) Load the COFF file into the DSP target system. 


Generating the COFF file (linker output .out file) uses the same flow for all boot 
methods. 


Generating a COFF File 


Generating a COFF file requires compiling the source code with the C compil- 
er, then assembling and linking the resulting assembly files, with the assembly 
level tools. A text editor creates additional assembly files or the files are ex- 
tracted from the RTS30 library. The linking process resolves all external refer- 
ences between program files and generates the .out COFF file subject to spe- 
cified options (Such as —c or —cr boot options). 


4.8.1.1 Compiler 


4.8.1.2 Assembler 
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Figure 4-37 on page 4-89 shows how one or more C files are compiled into 
multiple assembly files. Each assembly file is constructed from former C func- 
tions that were individually decomposed into standard logical sections: 


J The program code is assigned to .fext. 

The stack is assigned to .stack. 

Dynamically allocated memory is assigned to .sysmem. 
The switch tables are assigned to .const. 

Uninitialized variables are assigned to .bss. 

[1 initialized variables are assigned to .cinit. 


L} 
L} 
L} 
Lj 


If, following system reset, the program executes directly out of EPROM (micro- 
processor mode), a separate assembly file holds the reset vector (and possi- 
bly other interrupt vectors). The reset vector points to the address contained 
in the c_intOO symbol that the linker resolves with the beginning of the 
BOOT.ASM routine (from the RTS30 library). 


The assembler assembles all .asm files into their respective .obj files. Since 
each .asm file may have a .text section fragment for each function in the file, 
its .obj counterpart groups all the fragments into a single .text section. This ap- 
plies to all sections in that file. The results of the assembler process are multi- 
ple .obj files composed of single instances of all standard C sections. In addi- 
tion to the object files generated by the user, the subsequent boot procedures 
require another .obj file. The boot.asm file can be extracted from the RTS30 
library and assembled separately into boot.obj. The boot.obj is the first routine 
executed following reset. It initializes the C environment by setting up the sys- 
tem stack, processing initialized variables, setting up the page pointer, and 
calling the main function. While boot.asm file is required for a C program, other 
files may be extracted from the library, such as malloc.asm, which is used to 
allocate additional memory at run time. 
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4.8.1.3 Linker 
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The linker assigns physical addresses to logical program sections from .obj 
files. A linker command file defines the available physical memory segments 
using the MEMORY directive, assigns one or more sections to individual 
memory segments using the SECTIONS directive, and lists all object files con- 
taining sections to be processed. The order in which object files are listed is 
important and reflects the order in which individual sections are stacked in 
physical memory. For that reason, the boot.obj file must always be the first one 
listed, since it represents the execution entry point for every C program. The 
boot.obj global symbol c_int00 provides the entry address that can be resolved 
to other files that are linked with boot.obj (for example, the vector file that needs 
an address for the reset vector). Depending on the method, the linker can be 
invoked with the —c or —cr option. These two options control how aC program’s 
initialized variables are handled during the later stages of the boot process. 
See the TMS320C3x/C4x Assembly Language Tools User’s Guide for more 
information. 
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Figure 4-37. Compile, Assemble, and Link Flow 
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4.8.1.4 The .out (COFF) File 


4-90 


After resolving the external references among all program sections, the linker 
builds the .out file. The .out file is constructed in the binary COFF format, and 
it contains all the sections listed in the linker SECTIONS directive. It contains 
information about the program, information about how to load it into the target 
DSP system, and symbol information for the debugger that is later used to 
verify the code. All C and assembly symbols, such as subroutine labels, etc., 
canbe made visible in the debugger window (by embedding them in the COFF 
file), provided that they are declared as global symbols and the appropriate op- 
tions are used with the code generation tools. 


Some .out sections contain only the starting addresses and no code or data. 
They include the .stack section for the system stack, the .sysmem section for 
dynamically allocated memory, and the .bss section for uninitialized data. The 
boot process also uses the .bss section as a destination for the initialized vari- 
ables that are originally stored in the .cinit section of the .out file. Although they 
contain no data, the .stack and .sysmem sections are included in .out to allow 
the debugger tools to verify that the physical memory for those sections exists 
on the target board. Other sections in the COFF file, such as .vectors, .const, 
and .text, contain the starting addresses and the contents of the sections. 
When the debugger loads the .text section into the target system, for example, 
the opcodes for all assembly instructions for the entire program are copied, be- 
ginning at the section starting address. 


The .cinit section is different because it contains initialized variables. Once the 
.out file is generated, it can be burned into a 16- or 32-bit-wide EPROM, and 
the program can start executing directly from that EPROM following reset (in 
the microprocessor mode). But if the initialized variables reside in the same 
EPROM, they are not really variables, since one cannot write to an EPROM 
device and actually change the values of those variables. For that reason, be- 
fore user program execution begins, the boot.asm library routine copies the 
initialized variables from the EPROM .cinit section to the SRAM .bss section, 
one array of data at atime. Figure 4-37 on page 4-89 shows that the .cinit sec- 
tion is divided into individual array records; each array has a length, data con- 
tent, and destination address in the SRAM .bss section. The .bss section is the 
final destination for initialized variables, while the .cinit EPROM section is a 
temporary holding place for use before power-up/reset. The .cinit section also 
stores the —c/—-cr linker option selection for use in the later stages of the boot 
process. 
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4.8.2 Loading the COFF File to the Target System 


When the COFF file is loaded into the DSP target system, program and data 
content, as well as control information, are extracted. Then the control infor- 
mation is used to place the program/data contentin target memory. Some con- 
trol information embedded in the COFF file may not apply directly to the pro- 
gram/data content. For example, the COFF file may include a symbol table for 
the debugger or a memory width control word for the on-chip boot loader. 


Using the debugger to load the COFF file to target memory requires connect- 
ing the target board to the PC (on which the debugger is running) with an emu- 
lator cable and pod and then transferring the COFF file with the LOAD com- 
mand. The linker —c/—cr options control processing of the .cinit section during 
the load operation. 


The COFF file can also be loaded to a target system from an EPROM. The 
Hex30 utility converts the COFF file to an EPROM-programmer-compatible 
file that can be programmed to the EPROM. In the microprocessor mode, the 
program executes directly from the EPROM. In the microcontroller/boot loader 
mode, the on-chip boot loader first expands the EPROM contents into target 
SRAM and the program executes from SRAM. In either case, the C program 
begins execution at the start of the boot.asm library routine to initialize the C 
environment before the rest of the C program runs. 


4.8.3 Debugger Boot 


Figure 4-38 on page 4-93 and Figure 4—39 on page 4-94 show how to load 
the COFF file into the target system using the debugger load command. 


The debugger is a standard TI software development tool that runs on a PC 
platform. The debugger accesses the target board through the PC emulator 
card and cable. The cable connects to the target board through a 12-pin con- 
nector that routes the signals to the DSP’s emulation pins. The emulation pins 
control the operation of the modular port scan device (MPSD) scan chain in 
the processor. Depending on the command issued by the debugger, the 
emulation circuitry in the scan chain stops or resumes processor operation, 
examines/loads registers or memory, sets breakpoints, or executes code one 
instruction at a time (called single-step execution). The debugger LOAD com- 
mand reads the COFF file from the PC hard drive, extracts program/data con- 
tent, and transfers it through the emulator cable to the target board’s memory. 
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4.8.3.1 RAM Model (Linker —cr Option) 


When the COFF file is loaded into the target board’s memory, most sections 
in the file are processed by copying the program/data to the address defined 
at the beginning of each section; however, the initialized variables in the .cinit 
section are processed differently. If the COFF file is generated by the linker us- 
ing a—cr option, the .cinit section of the file is loaded using the RAM model (see 
Figure 4—38). The RAM model assumes that the target memory is composed 
exclusively of SRAM devices. Thus, the initialized variables can be directly co- 
pied to the SRAM .bss section, one array at a time, without first placing them 
inatemporary EPROM .cinit section. Once the initialized variables have been 
loaded into SRAM, they can be read or written to by the CPU without further 
initialization steps by boot.asm at the beginning of C program execution. 


4.8.3.2 ROM Model (Linker —c Option) 
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If the COFF file is created with the linker —c option, the loader places the .cinit 
section in the target memory according to the ROM model. The ROM model 
copies the .cinit section as one block to the address specified at the beginning 
of the same .cinit section. Following the load operation, the ROM model 
expects the boot.asm routine (at the beginning of the C program) to further 
process the .cinit section by copying its contents to the SRAM .bss section, one 
array at atime. After the COFF load operation, the memory content is the same 
as that created by the RAM model with one exception: the target SRAM still 
contains the temporary .cinit section, which serves no purpose after it is 
processed by boot.asm. The ROM model can still be useful; for example, it is 
useful to simulate the microprocessor-mode EPROM boot (see Figure 4—39). 
During the development cycle, instead of burning anew EPROM each time the 
code is modified, the EPROM can be removed and replaced with an equivalent 
SRAM device (by reconfiguring jumpers). The ROM model allows use of the 
loader to quickly load and debug the modified code while preserving the bus 
activity at power up to simulate an EPROM boot. 
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Figure 4-38. Loading C Object File into TMS320C32 Memory (Linker —cr Option) 
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Figure 4-39. Loading C Object File into TMS320C32 Memory (Linker —c Option) 
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4.8.4 EPROM Boot 


Booting a DSP target board from C code stored in nonvolatile memory and ac- 
cessible to the DSP can be done in two ways. If the DSP is powered up in the 
microprocessor mode, the reset causes the program to start execution from 
32- or 16-bit EPROM by fetching the reset vector from memory address 
000000h and branching to the reset interrupt service routine (ISR) pointed to 
by that vector. 


On the other hand, if the DSP is powered up in the microcomputer/boot loader 
mode, program execution starts with the on-chip boot loader program. The 
boot loader reads the COFF file from an 8-bit EPROM and expands it to the 
system SRAM from which it can be executed (16 or 32 bits wide). In either 
case, program entry occurs at the beginning of the boot.asm library routine to 
initialize the C environment prior to execution of the C code. 


4.8.4.1 Microprocessor Mode (Linker —c Option) 


Before the binary COFF file can be burned into an EPROM, it must be con- 
verted to an ASCII format that an EPROM programmer can recognize (see 
Figure 4—40 on page 4-97). The hex conversion utility converts COFF files to 
a programmer object file format such as Intel™ Hex. The EPROM programmer 
uses the converted files to program one or more EPROMSs that can be inserted 
into the DSP target board. 


If the linker —c option is used to create the COFF file (ROM model), the hex 
utility copies the .cinit section directly into the programmer object file without 
processing its content. In other words, the .cinit section in the programmed 
EPROM contains the initialized data as well as destination addresses and 
lengths in .bss for individual .cinit data arrays. To start program execution from 
EPROM at power up, the DSP must be configured in the microprocessor mode 
by pulling the MCBL/MP pin low. Triggered by the low-to-high transition of the 
RESET pin, the DSP executes the reset vector fetch read cycle. The reset vec- 
tor points to the boot.asm routine, which is executed next. The linker —c option 
sets a control bit in the .cinit section of the COFF file. 


When the boot.asm program executes the .cinit section, it checks the —c/—-cr 
control bit. The —c option (ROM model) causes boot.asm to copy the contents 
of each array within the .cinit section to its destination in the .bss section 
mapped to SRAM. The initialized variables must be copied from EPROM to 
SRAM at the beginning of program execution, because they cannot be modi- 
fied in EPROM (variable data must be changeable during program execution). 
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4.8.4.2 Microcomputer/Boot Loader Mode (Linker —cr Option) 
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The ’C32 features an on-chip hardwired boot loader program in the internal 
programmable logic array (PLA). The boot loader reduces the DSP target 
board cost by replacing multiple fast EPROMs with a single 8-bit slow (inex- 
pensive) EPROM. Because the ’C32 cannot execute code from memory that 
is only 8 bits wide, the on-chip boot loader program reads the boot table from 
the byte-wide EPROM and reconstructs all sections of the original COFF file 
one byte at a time before placing the program/data in SRAM (see Figure 4—41 
on page 4-98). 


To power up the DSP in the boot loader mode, the MCBL/MP pin must be held 
high when the RESET signal is deasserted. At that stage, the DSP starts 
executing the boot loader code from internal address 000045h. Immediately 
after it starts execution, the boot loader checks the interrupt flag (IF) register. 
All interrupts are disabled and remain disabled until the application program 
enables them. Depending on which external interrupt is asserted, the boot 
loader looks for the boot table at one of three external memory locations or at 
the serial port. The interrupt pins carry a message to the boot loader telling it 
where to get the boot table after reset. 


The boot table structure resembles the COFF file from which it was derived by 
the hex conversion utility. The main feature that distinguishes the boot table 
from aregular hex utility output (such as the microprocessor mode boot exam- 
ple) is that in addition to the contents of the COFF sections, the boot table in- 
cludes special control words for the on-chip boot loader program to instruct it 
on how to assemble and load those sections. Each section is built into a block 
preceded by three control words: block size, destination address, and destina- 
tion memory width/data size. Multiple blocks can be transferred to selected 
parts of the DSP memory map. To format the COFF file into the boot table, the 
program section to be booted must be identified to the hex conversion utility 
with the SECTIONS directive. The boot table is constructed of the COFF sec- 
tions identified in the SECTIONS directive and marked with the boot option 
(see Figure 4—41). 


If the linker uses the -cr option to create the COFF file, the hex utility processes 
the COFF .cinit section and assigns the addresses in the .bss section to the 
corresponding .cinit arrays in the boot table. Every C program starts execution 
with the boot.asm routine, but because one of the boot.asm control flags indi- 
cates that the COFF file was created with the linker —cr option, the code skips 
transfer of .cinit contents to .oss. The hex utility performs that task by placing 
all the initialized variables in .oss while creating the boot table without relying 
on boot.asm to make the transfer at run time (see Figure 4—41). 
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Figure 4-40. 32-Bit EPROM Boot in the Microprocessor Mode (Linker —c Option) 
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Figure 4-41. 8-Bit EPROM Boot Using the On-Chip Boot Loader (Linker —cr Option) 
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4.8.5 Boot Table Memory Considerations 


There is a significant difference in the methods of interfacing the external 
memory holding the boot table and the program/data memory used during nor- 
mal code execution. The address presented on the ’C32’s pins may be shifted 
by one or two bits, depending on the size of the memory bank (see 
Figure 4—42), but the external memory holding the boot table must have no ad- 
dress shift relative to the C32 address pins, regardless of the width of the boot 
memory (see Figure 4—43). The boot loader program reads the boot table 
memory width from the first word of the boot table. It reads the boot table con- 
tents as 32-bit data, and, depending on the memory width, it reconstructs the 
program and data before sending them to the memory map. Because of this 
difference in the address shift, the byte-wide EPROM containing the boot table 
is not best suited to store normal data unless special hardware is added to han- 
dle the address shift. 
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Figure 4-42. Memory Configuration for Normal Program Execution 
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Figure 4-43. Boot Table Memory Configuration 
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Note: For external memory used during normal program execution, the amount of external address shift depends only on the 
width of the memory bank. 
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4.8.6 Host Load 


While some DSP systems stand alone, others may be embedded DSPs con- 
trolled by a host, such as a microcontroller or another DSP. During system 
power up, the DSP boot table may be transferred from the host to the DSP 
through a serial port or through a byte-wide latch. This eliminates the need for 
a dedicated boot EPROM on the DSP side of the system. On the host side, the 
DSP boot table may be temporarily stored in an EPROM, prior to the DSP boot. 
Following reset, the host transfers the boot table to the DSP to initialize it and 
start program execution. 


4.8.6.1 Boot From Serial Port 


If the DSP powers up in the microcomputer/boot loader mode (MCBL/MP 
high), the low on the INT3 pin and high on all other INTx pins causes the on- 
chip boot loader program to read the boot table from the serial port. Most mi- 
crocontrollers also feature a serial port, and in many cases the two ports can 
be connected directly without additional glue logic for an economical host/DSP 
interface. Following the boot, the serial channel can also be used by the host 
to send/receive data and to control the operation of the DSP (see Figure 4-44 
on page 4-104). Generating the boot table requires linking the object files with 
the —cr option (RAM model) and then appending the hex utility’s SECTIONS 
directive with the boot keyword to identify the COFF sections to be included 
in the boot table. 


4.8.6.2 Boot From a Latch 
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If the host processor does not have a serial port, the DSP can be booted from 
the host using an 8-bit latch. During the boot operation, the host feeds the boot 
table bytes to the latch on one side, while the DSP reads the data from the oth- 
er. Following reset, interrupts 0, 1, and 2 direct the DSP boot loader to the latch 
address. The same interrupts cause the boot loader to read from the parallel 
port, so some control/decode logic is required to make the DSP read from 
memory instead of from a latch. The same glue logic must also be connected 
to the host side of the latch to ensure proper data-transfer synchronization be- 
tween two asynchronous systems (see Figure 4-45 on page 4-105). At power 
up, the DSP boot table most likely resides in the host’s EPROM, and the host 
outputs the boot table to the latch one byte at a time following reset. Creating 
the boot table for this operation uses the same linker/COFF options as for the 
host/serial boot and the direct EPROM boot. 
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4.8.6.3 Asynchronous Boot From a Communications Port 


If the host processor has an asynchronous communications capability, then 
the ‘C32 can make a glueless connection to the host’s communication port 
(see Figure 4—46 on page 4-106). In addition to the data bus, three ’C32 pins 
are involved in the asynchronous boot: XFO, XF1, and IACK. The XF1 pin 
serves as the data ready input to the C32, and XFO is the data acknowledge. 


The IACK pin pulses when there is no valid data present on the data lines 
(which are needed for the ’C4x comm-port interface). For boot loader mode, 
itis assumed that the host (such as a’C 4x) connects directly to the data ready 
and data acknowledge control lines. The host drives the data ready signal low 
to indicate to the DSP that the next byte of the boot table has been placed on 
the data lines. The DSP responds by pulling the data acknowledge signal low 
after reading the data. When the host sees the data acknowledge signal, it 
stops driving the data bus and brings the data ready line high. To complete the 
handshaking transaction, the DSP brings the data acknowledge signal high to 
request the next byte from the host. The boot table for this type of boot opera- 
tion is created with the linker —cr option (RAM model) and hex conversion utility 
SECTIONS directive boot keyword — the same options used for other boot 
load procedures involving the on-chip boot loader program. 
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4.9 TMS320C30 Addressing up to 68 Gigawords 


The ’C30 primary bus has 24 address lines which allow addressing up to 
16 megawords of memory. The ‘C30 expansion bus has 13 address lines 
addressing 8K words. These two busses, expansion bus address lines 
[XA(12-0)] and the primary lines [A(23-0)], can be used simultaneously to 
extend the address to 36 bits. This is accomplished by using the feature of the 
’C8x family that holds the past address bits on an external bus until a new 
external access occurs. That means, the address bus works as a latch. 
Figure 4-47 shows how these two busses are combined together. The 
following parallel instruction accomplishes this task: 


STI Rx, *ARn ; address MSTRB while loading a 
; value from STRB memory 
|| LDI = *ARp, Rg j 


where: 
Rx and Rq designate registers RO to R7 (but not the same register) 
ARn and ARp designate auxiliary registers ARO to AR7 (but not the same 
register). 
—— ——— —— —— —  — Gwaun an  $ <q 
Note: 


ARn contains the 8-Mword segment address plus 800000h. ARp contains 
the address within the 8-Mword segment and is between 0 and 7FFFFFh. 


| ss) 


Figure 4-47. TMS320C30 Combination of Primary and Expansion Busses to Address 68 
Gigawords 


C30 


A(23) }— No connect 
A(22:0) »| A(22:0) 
_STRB- ) » CS Memory array 
MSTRB/—__ 
XA(12:0) p) A(12:0) 
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Chapter 5 


Programming Tips 


Programming style reflects personal preference. The purpose of this chapter 
is not to impose any particular style, but to highlight features of the ’C3x that 
can produce faster and/or shorter programs. The tips cover the C compiler, as- 
sembly language programming, and low-power mode wakeup. 


Topic Page 
5.1 Hints for Optimizing C Code ----------------------------------- 5-2 
5.2 Hints for Assembly Coding------------------------------------- 5-5 
5.3 Low-Power Mode Wakeup Example - - -- ------------------------- 5-7 
5.4  Bit-Reversed Addressing in C ---------------------------------- 5-9 
5.5 Sharing Header Files in C and Assembly ----------------------- 5-10 
5.6 Addressing Peripherals as Data Structures in C ---------------- 5-11 
5.7 Linking C Data Objects Separate From the .bss Section --------- 5-13 
5.8 Interrupts in C ----------------------------------------------- 5-16 
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Hints for Optimizing C Code 


5.1 Hints for Optimizing C Code 


The ’C3x was designed with a large register file, software stack, and memory 
space that easily supports the floating point C compiler. The C compiler trans- 
lates ANSI C programs into assembly language source code. It also increases 
code portability and decreases application porting time. 


After writing your application in C language, debug the program and determine 
whether it runs efficiently. If the program does not run efficiently: 

Use the optimizer with —o2 or —03 options when compiling 

Use registers to pass parameters (-ms compiling option) 

Use inlining (x compiling option) 

Remove the —g option when compiling 

Follow some of the efficient code generation tips listed below 


OOUUOU 


Identify places where most of the execution time is spent and optimize these areas 
by writing assembly language routines that implement the functions. Call the rou- 
tines from the C program as C functions. 


The efficiency of the code generated by the floating-point compiler depends 
to alarge extent on the compiler options used when writing your C code. There 
are specific constructs that can vastly improve the compiler’s effectiveness: 


Lj) Use register variables for often-used variables. This is particularly true 
for pointer variables. Example 5-1 shows a code fragment that ex- 
changes one object in memory with another. 


Example 5—1. Exchanging Objects in Memory 
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register float *src,*dest, temp 
do 
{ 
temp *+4+S10C; 
*src *++dest; 
*dest = temp; 
} 
while (--n) ; 


[J Precompute subexpressions. This especially applies to array refer- 
ences in loops. Assign commonly used expressions to register variables, 
where possible. 


(1 Use *++to step through arrays rather than using an index to recalculate 
the address each time through a loop. 


Hints for Optimizing C Code 


As an example of the previous two points, consider the loops in Example 5-2. 


Example 5-2. Optimizing a Loop 


/* loop 1 */ 
main () 
{ 
float a[10], b[10]; 
int i; 
for (i = 0; i < 10; ++i) 
afi] = (af[i] * 20) + b[il; 
} 
/* loop 2 */ 
main () 
{ 
float a[10], b[10]; 
int i; 
register float *p = a, *q =b; 
for (i = 0; i < 10; ++i) 
*pot++ = (*p * 20) *Qqtt+; 
} 


Loop 1 executes in 19 cycles. Loop 2, which is the equivalent of loop 1, exe- 


cutes in 12 cycles. 


(1 Use structure assignments to copy blocks of data. The compiler gen- 
erates very efficient code for structure assignments, so nest objects within 
structures and use simple assignments to copy them. 


Avoid large local frames and declare the most often used local vari- 


ables first. The compiler uses indirect addressing with an 8-bit offset to 
access local data. To access objects on the local frame with offsets greater 
than 255, the compiler must first load the offset into an index register. This 
requires one extra instruction and incurs two cycles of pipeline delay. 
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Hints for Optimizing C Code 


(1 Avoid the large model. The large model is inefficient because the compil- 


er reloads the data-page pointer (DP) before each access to a global or 
static variable. If you have large array objects, use malloc() to dynamically 
allocate them and access them via pointers rather than declaring them 
globally. Example 5-3 illustrates two methods for allocating large array 
objects. 


Example 5-3. Allocating Large Array Objects 


/* Inefficient Method 
int a[1000000}1; 


* | 
/* Inefficient */ 


alil = 10; 


/* Efficient Method */ 


int *a = (init *)malloc(1000000) ; /* 


Efficient */; 
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5.2 Hints for Assembly Coding 


Each program has unique requirements. Not all possible optimizations are 
appropriate in every case. You can use the suggestions in this section as a 
checklist of available software tools. 


L 


Use delayed branches. Delayed branches execute in a single cycle; reg- 
ular branches execute in four cycles. The next three instructions are exe- 
cuted whether the branch is taken or not. If fewer than three instructions 
are required, use the delayed branch and append No-operation instruc- 
tions (NOPs). A reduction in machine cycles still occurs. 


Apply the repeat single/block construct. In this way, loops are achieved 
with no overhead. Nesting such constructs does not normally increase 
efficiency, so try to use the feature on the most often performed loop. Note 
that the RPTS instruction is not interruptible and the executed instruction 
is not refetched for execution. This frees the buses for operand fetches. 


Use parallel instructions. It is possible to perform a multiply in parallel 
with an add (or subtract) and to execute stores in parallel with any multiply 
or arithmetic logic unit (ALU) operation. This increases the number of 
operations executed in a single cycle. For maximum efficiency, observe 
the addressing modes used in parallel instructions and arrange the data 
appropriately. Itis possible to have loads in parallel with any multiply or add 
(or subtract) by multiplying by 1 or adding a 0. Therefore, to implement 
parallel instructions with a data load, substitute a multiply or an add 
instruction with one extra register containing 1 or 0, respectively, in place 
of a load instruction. 


Maximize the use of registers. The registers are an efficient way to 
access scratch-pad memory. Extensive use of the register file facilitates 
the use of parallel instructions and helps avoid pipeline conflicts when you 
use the registers in addressing modes. 


Use the cache. This is especially important in conjunction with slow exter- 
nal memory. The cache is transparent to the user, so make sure that it is 
enabled. 


Use internal memory instead of external memory. The internal 
memory (2K x 32 bits RAM and 4K x 32 bits ROM) is considerably faster 
to access. In a single cycle, two operands can be brought from internal 
memory. You can maximize performance if you use the direct memory ac- 
cess (DMA) in parallel with the CPU to transfer data to internal memory 
before you operate on it. 


Avoid pipeline conflicts. For time-critical operations, make sure you do 
not miss any cycles because of pipeline conflicts. 
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The preceding checklist is not exhaustive, and it does not address the detailed 
features outlined in other chapters of this manual. To learn how to exploit the 
full power of the ’C3x, study the architecture, hardware configuration, and 
instruction set of the device described in the TMS320C3x User’s Guide. 


Low-Power Mode Wakeup Example 


5.3 Low-Power Mode Wakeup Example 
There are two instructions by which the ’C31, ’'LC31, and ’C32 are placed in 
the low-power consumption mode: 


4) IDLE2 
—1) LOPOWER 


The LOPOWER instruction slows down the H1/H3 clock by a factor of 16 dur- 
ing the read phase of the instruction. The MAXSPEED instruction wakes the 
device from the low-power mode and returns it to full frequency during 
MAXSPEED’s read cycle. However, the H1/H3 clock may resume in the phase 
opposite to the one it was in before the clocks were shut down. 


The IDLE2 instruction has the same functions that the IDLE instruction has, 
except that the clock is stopped during the execute phase of the IDLE2 instruc- 
tion. The clock pin stops with H1 high and H3 low. The status of all the signals 
remains the same as in the execute phase of the IDLE2 instruction. In emula- 
tion mode, however, the clocks continue to run, and IDLE2 operates identically 
to IDLE. The external interrupts INT(O—3) are the only signals that start up the 
processor from the mode the device was in. Therefore, you must enable the 
external interrupt before going to IDLE2 power-down mode (see 
Example 5-4). If the proper external interrupt is not set up before executing 
IDLE2 to power down, the only way to wake up the processor is with a device 
reset. 
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Example 5—4. Setup of IDLE2 Power-Down Mode Wakeup 
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* 
* TITLE IDLE2 POWER-DOWN MODE WAKEUP ROUTINE SETUP 
* 
* THIS EXAMPLE SETS UP THE EXTERNAL INTERRUPT 0, INTO, BEFORE 
ss EXECUTING THE IDLE2 INSTRUCTION. WHEN THE INTO SIGNAL IS RECEIVED 
* ATER, THE PROCESSOR WILL RESUME FROM ITS PREVIOUS 
sa STATE. NOTE: THE “INTRPT” SECTION IS MAPPED FROM THE 
* ADDRESS 0 FROM THE RESET AND INTERRUPT VECTORS. 
* 
-sect “INTRPT” 
RESET .word START . Reset vector 
INTO -word INTO_ISR ; INTO interrupt vector 
INT1 -word INT1_ISR ; INT1 interrupt vector 
INT2 -word INT2_ISR ; INT2 interrupt vector 
INT3 -word INT3_ISR ; INT3 interrupt vector 
-text 

LDP @SP_ADR 

LDI @SP_ADR, SP ; Set up stack pointer 

OR Olh, IE ; Enable INTO 

IDLE2 7 Set GIE = 1 and stop clock 
INTO_ISR RETI ; Return to instruction after IDLE2#define N 16 


There is one cycle of delay while waking up the processor from the IDLE2 
power-down mode before the clocks start up. This adds one exira cycle from 
the time the interrupt pin goes low until the interrupt is taken. The interrupt pin 
needs to be low for at least two cycles. The clocks may start up in the phase 
opposite the phase that they were in before the clocks were stopped. 


Bit-Reversed Addressing in C 


5.4 Bit-Reversed Addressing in C 


The C language does not have any construct to take advantage of the bit- 
reversed addressing feature of the ’C3x. To take advantage of this feature, 
Figure 5-1 shows the assembly instructions added to the C code to use bit- 
reversed addressing. 


Figure 5—1. Bit-Reversed Addressing in C Code 


#define N 16 

int x[N] = { 0,8,4,12,2,10,6,14,1,9,5,13,3,11,7,15 }; 
int y[N] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; 

/* int bitrev(int m, intn); */ 


void main() 


{ 


int 1; 
asm (” PUSH ARD5"”); 
asm (” PUSH ARO”); 
asm (” LDI 8, IRO; ; Initialize IRO TO 1/2 N”); 
asm(” LDI @CONST+0,AR5 ; ARS <- address of X[] "”); 
asm (” LDI @CONST+1, ARO ; ARO <- address of Y[] "); 
for ( i=0; i<n; i++ ){ 

/* y{bitrev(i,N) ] = x[i]; */ 

asm(” LDI *AR5++(IRO)b, RO”); 

asm(” STI RO, *ARO++”); 


} 
asm(” POP ARO”); 
asm(” POP AR5"”); 


/* These statements place x and y in .bss and make their 
addresses available via the CONST table. */ 


asm (” .bss CONST, 2 yee 
asm (" .sect \v-cintey” ); 
asm (” .word 2, CONST A) ers 
asm (” -word _x we 
asm (” -word _y Ao) er 
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5.5 Sharing Header Files in C and Assembly 


Sometimes it is useful to be able to define named constants that can be used 
in both C and assembly language. 


One method is to have separate header files that define the same symbols: 
aC include file with #define directives and an assembler include file with .set 
or .asg directives. However, it is more convenient to have a single, shared 
header file that defines symbols once for C and assembly. 


Figure 5—2 shows how a file can be used normally as a C include file and also 
to generate an assembler include file. By compiling it and defining ASMDEFS, 
an assembler include file is generated from this file with the following com- 
mand: 


c130 -dASMDEFS -k defs.h 


Figure 5-2. Input File defs.h 


#define PI 3.14 

#define E 2.72 

#ifdef ASMDEFS /* IF DEFINED, CREATE .asg DIRECTIVES */ 
#define ASM _ASG(sym) asm(”\t.asg\t” VAL(sym) ”.” #sym 
#define VAL (sym) #Sym 

ASM_ASG(PI); 

ASM_ASG(E) ; 

#endif /*ASMDEFS*/ 


The output is the file defs.asm, which contains .asg directives for your symbols 
(see Figure 5-3). 


Figure 5—3. Output File defs.asm 


; «-. <compiler-generated header stuff> ... 
-asg 3.14,PI 
-asg 2.72,E 


You can then use .include in your assembly modules. The same technique can 
be used to create .set directives rather than .asg. 


Addressing Peripherals as Data Structures in C 


5.6 Addressing Peripherals as Data Structures in C 


A data structure is usually assigned to the .bss section by the C compiler. A 
.bss section stores global and statically allocated variables. A peripheral, such 
as a serial port, has memory-mapped control registers with addresses differ- 
ent from .bss. To manipulate a memory-mapped peripheral register in C, follow 
one of the methods listed below. 


Lj Method 1: Use a pointer to the peripheral. 


Pointer ————> Address = 0x808000 


Peripheral as memory locations 


1) Declare a structure that logically represents the memory locations of 
the peripheral. 


struct controller { 
unsigned int status; 


}; 


2) Declare a pointer to the structure and initialize it to the peripheral’s ad- 
dress. 


struct controller *IFperipheral = (struct controller *)0x808000; 
3) In your code, access the peripheral’s memory values indirectly. 
IFperipheral->status = 0; 
(J) Method 2: Place the structure in its own section. 
1) Declare a peripheral instead of a pointer. 
struct controller IFperiph; 


2) Use inline assembly to give the structure its own section. 


asm(”_IFperiph .usect \"periph\”, 128); 
/* 128 is size of struct */ 


This creates a user-defined section that can be linked to any ad- 
dress. 


3) Use your linker command file to map the section to memory. 
periph: load = 0x808000 
4) Address the structure elements directly. 


IFperiph.status = 0; 
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Method 1 is very useful for addressing peripheral or memory buffers that are 
device specific. Method 2 is preferred for addressing peripherals or memory 
buffers which are not device specific (that is, peripherals are user specified). 
This method ensures the task of mapping and aligning user-specific peripher- 
als and/or memory buffers to the linker. The choice depends on your individual 
application. 


See section 5.7 for another method of placing the structure in its own section 
using #pragma directives. 


Linking C Data Objects Separate From the .bss Section 


5.7 Linking C Data Objects Separate From the .bss Section 


The TMS320 DSP C compilers produce several relocatable blocks of code 
and data when C code is compiled. These blocks are called sections and can 
be allocated into memory in a variety of ways to conform to a variety of system 
configurations. The .bss section is used by the compiler for global and static 
variables; itis one of the default COFF sections that is used to reserve a speci- 
fied amount of space in the memory map that can later be used for storing data. 
It is normally unitialized. All global and static variables in a C program are 
placed in the .bss section. For example, on the floating-point DSPs, you might 
want to link all of your variables into off-chip memory but place a frequently- 
used array in on-chip RAM block 0. 


(1 Method A: Declare variable in a separate section. 


1) Declare the variable that is to be separated from the .bss section ina 
separate file. For example, declare a 32-word array, tapDelay [ ], ina 
file called array.c as follows: 

/* File: ARRAY.C */ 
int tapDelay [32] 
/* End of file */ 

2) Declare the variable as extern in any file that makes a reference to it. 
Consider the following file, tesi.c, that makes a reference to the array 
declared in file array.c as follows: 


/* File: TEST.C */ 


extern int tapDelay[ ]; 
void main (void) 
{ 

ant 1+ 

tapDelay[i] = 0; 


} 
/* End of file */ 
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3) 


In the linker command file, link this variable separate from the .bss 
section in the SECTIONS section. The following linker command file 
segment illustrates how to link the array tapDelay [ ] onto the ’C3x on- 
chip, dual-access data RAM block 0 while linking the rest of the global 
and static variables into part of a similar data RAM block 1: 


/* File: TEST.CMD */ 


test.obj 
array.obj 


MEMORY 
{ 
RAMBO: origin = 0x809800, length = 0x400 
RAMB1: origin = 0x809c00, length = 0x400 
} 
SECTIONS 
{ 
.bss :{} >RAMB1 
tapdelayline : {array.obj(.bss) } > RAMBO 


} 
/* End of file */ 


() Method B: Declare variable in a #pragma DATA_SECTION. 


1) 


Declare the variable that is to be separated from the .bss section ina 
#pragma DATA_SECTION. Consider the example described in Meth- 
od A. The following code segment uses the DATA_SECTION pragma 
to declare a 32-word array, tapDelay [ ], that is placed separate from 
the other global and static variables: 


/* File: TEST.c */ 
#pragma DATA_SECTION (tapDelay, ”.tapdelayline”) 
int tapDelay[32]; 


void main(void) 
{ 


int i; 
tapDelay[i] = 0; 


} 
/* End of file */ 


Linking C Data Objects Separate From the .bss Section 


2) Inthe linker command file, use the section name .tapdelayline to place 
the array tapDelay [ ]in RAM block 0. Separate it from the other global 
and static variables that are in the .bss section as follows: 


/* File: TEST.CMD */ 


test.obj 
array.obj 


MEMORY 


{ 
EXTO: origin = 0x100, len = 0x3f00 
RAMO: origin = 0x809800, len = 0x400 
} 
SECTIONS 
{ 
-bss Hae EXTO 
-tapdelayline : {} RAMO 


} 
/* End of file */ 


Method B is available in the floating-point DSP C compiler version 4.60 or 
greater. It is described in the TMS320 Floating-Point DSP Code Generation 
Tools Release 4.70 Getting Started Guide. 
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5.8 


Interrupts in C 


To use interrupts in C, you must write an interrupt service routine (ISR), initial- 
ize the interrupt vector table, and link these parts with the linker command file. 
These steps are described below. 


Step 1: 


Step 2: 


Write a C language interrupt service routine (ISR). 


The C compiler requires that each ISR be named as follows: 


void c_int0On (void) /* n is the int number */ 
{ 
/* a C function that is an ISR */ 


} 

The interrupt routine must not return a value and has no arguments. 
The C compiler recognizes this naming convention and treats it as 
a normal ISR. This means it performs a context save of the neces- 
sary registers and returns from the routine via an RETI instruction. 


A good practice is to include the interrupts in a separate file called 
ints.c or something similar. This allows a modular style, simpler 
maintenance, and software that is easy to understand. 


Initialize the interrupt vector table using either C or assembly lan- 
guage. 


In microprocessor mode of 'C30 and ’C31, the first Ox40 addresses 
are reserved for the interrupt and trap vectors. Address 0 (zero) 
holds the address of the reset routine. If using the —C linker option, 
the RTS30.lib function boot.asm takes care of defining the reset 
function, but the vector table initialization is left to the user. 


An assembly language routine might look like this: 


; file name is vectors.asm 


7 .sect “vectors” 7 a new section begins here 
.word _c_int00 ; the address of the reset 

vector 
-word _c_int0Ol ; the ISR for interrupt 0 
-word _c_int02 ; the ISR for interrupt 1 

7 etc. 

7 end 


This routine creates a new section that is merely a list of addresses 
where the interrupt routines can be found. It can be written in C by 
encapsulating each line in an asm statement. 


For example: 


asm (” -sect \"vectors\” "); 
A C function that is an ISR. 


Step 3: 


Interrupts in C 


Link the interrupt service routine (ISR) and the initialized interrupt 
vector table with the linker command file. 


The linker command file provides the mechanism for including the 
vectors.asm object and the ints.c object. 


/* file name == mylink.cmd */ 
vectors.obj 
ints.obj 


The MEMORY section needs to identify the location of the int vec- 
tors. 


MEMORY 
{ 


VECTORS: origin = Oh, length = 40h 
} 


The SECTIONS section needs to map the user-defined section 
called vectors to the memory location. 


SECTIONS 
{ vectors: > VECTORS 


} 
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DSP Algorithms 


Certain features of the ’C3x architecture and instruction set facilitate the solu- 
tion of numerically intensive problems. This chapter presents examples of 
applications using these features, such as companding, filtering, fast Fourier 
transforms (FFTs), and matrix arithmetic. 
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6.1 Companding 


6-2 


In telecommunications, conserving channel bandwidth while preserving 
speech quality is a primary concern. This is achieved by quantizing the speech 
samples logarithmically. An 8-bit logarithmic quantizer produces speech quali- 
ty equivalent to a 13-bit uniform quantizer. The logarithmic quantization is 
achieved by companding (COMpress/exPANDing). Two international stan- 
dards have been established for companding: the u-law standard (used in the 
United States and Japan), and the A-law standard (used in Europe). Detailed 
descriptions of u law and A law companding are included in Volume 1 of the 
book Digital Signal Processing Applications With the TMS320 Family. 


During transmission, logarithmically compressed data in sign-magnitude form 
is transmitted along the communications channel. If any processing is neces- 
sary, you must expand this data to a 14-bit (for u law) or 13-bit (for A law) linear 
format. This operation is performed when the data is received at the digital sig- 
nal processor (DSP). After processing, the result is compressed back to 8-bit 
format and transmitted through the channel to continue transmission. 


Example 6-1 and Example 6—2 show -law compression and expansion (that 
is, linear to u-law and u-law to linear conversion), while Example 6-3 and 
Example 6—4 show A-law compression and expansion. For expansion, using 
a look-up table is an alternative approach. A look-up table trades memory 
space for speed of execution. Since the compressed data is eight bits long, you 
can construct a table with 256 entries containing the expanded data. If the 
compressed data is stored in the register ARO, the following two instructions 
put the expanded data in register RO: 


ADDI @TABL, ARO ; @TABL = BASE ADDRESS OF TABLE 
LDI*ARO, RO ; PUT EXPANDED NUMBER IN RO 


You could use the same look-up table approach for compression, but the re- 
quired table length would be 16384 words for pt-law and 8192 words for A-law. 
If this memory size is not acceptable, use the subroutines presented in 
Example 6—1 or Example 6-3. 


Example 6—1. u-Law Compression 


Companding 


TITLE UtLAW COMPRESSION 


SUBROUTINE MUCMPR 


ARGUMENT ASSIGNMENTS: 
ARGUMENT | FUNCTION 

——— — — 4+-—-—----—------------ 

RO | NUMBER TO BE CONVERTED 


REGISTERS USED AS INPUT: RO 
REGISTERS MODIFIED: RO, Rl, R2 
REGISTER CONTAINING RESULT: RO 


, SP 


‘SP’ IS USED IN THE COMPRESSION 


MUCMPR’, MAKE SURE TO INITIALIZE IT IN THE 


+ + + FF FF F FF FF F F F F FF F F F F OF 


NOTE SINCE THE STACK POINTER 

ROUTINE * 

CALLING PROGRAM. 
CYCLES: 20 WORDS: 17 
-global MUCMPR 

MUCMPR LDI RO,R1 ; 

ABSI RO, RO 

CMPI 1FDEH, RO ; 

LDIGT 1FDEH, RO ; 

ADDI 33,R0 

FLOAT RO ; 

MPYF 0.03125,R0 ; 

LSH 1,RO ; 

PUSHF RO 

POP RO ; 

LSH +20,R0 ; 

LDI 0,R2 

LDI R1,R1 ; 

LDILT 80H,R2 ; 

ADDI R2,R0 ; 

NOT RO ; 

RETS 


Save sign of number 


If RO>Ox1FDE, 
saturate the result 
Add bias 


Normalize: (segt+5) OWXYZx...x 
Adjust segment number by 2** (+5) 
(seg) WXYZx...x 


Treat number as integer 
Right-justify 


If number is negative, 
set sign bit 
RO = compressed number 
Reverse all bits for transmission 
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Example 6-2. u-Law Expansion 


* TITLE U-LAW EXPANSION 


NUMBER TO BE CONVERTED 


R2, SP 


[ CASE) WORDS: 14 


Complement bits 
Isolate quantization bin 


Add bias to introduce 1xxxxl 
Store for sign bit 


Isolate segment cod 

Shift and put result in RO 
Subtract bias 

Test sign bit 


Negate if a negative number 


x SUBROUTINE MUXPND 
* ARGUMENT ASSIGNMENTS: 
* 
* ARGUMENT FUNCTION 
PO SS 4+--------—------— 
* RO 
* 
* REGISTERS USED AS INPUT: RO 
* REGISTERS MODIFIED: RO, 
* REGISTER CONTAINING RESULT: RO 
* 
* CYCLES: 20 (WORS1 
-global MUXPND 
* 
MUXPND NOT RO, RO 
LDI RO,R1 
AND OFH,R1 
LSH 1,,,R1 
ADDI 33,R1 
LDI RO, R2 
LSH +4,R0 
AND 7,R0 
LSH3 RO,R1,R0 
SUBI 33,R0 
TSTB 80H, R2 
RETSZ 
NEGI RO 
RETS 
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TITLE AtLAW COMPRESSION 
SUBROUTINE ACMPR 
ba ARGUMENT ASSIGNMENTS: 
ARGUMENT | FUNCTION 
i + —_ _ — — 
RO | NUMBER TO BE CONVERTED 
REGISTERS USED AS INPUT: RO 
REGISTERS MODIFIED: RO, Rl, R2, SP 
REGISTER CONTAINING RESULT: RO 
NOTE: SINCE THE STACK POINTER ‘SP’ IS USED IN THE COMPRESSION 
ROUTINE ‘ACMPR’, MAKE SURE TO INITIALIZE IT IN THE 
CALLING PROGRAM. 
Ld CYCLES:22 WORDS: 19 
-global ACMPR 
ACMPR LDI RO,R1 $ Save sign of number 
ABSI RO, RO 
CMP I 1FH, RO ; If RO<0x20, 
BLED END A do linear coding 
CMPI OFFFH, RO ; If RO>OxFFF, 
LDIGT OFFFH, RO i saturate the result 
LSH +1,RO0 ; Eliminate rightmost bit 
FLOAT RO ; Normalize: (segt+3) OWXYZx...x 
MPYF 0.125,R0 H Adjust segment number by 2** (+3) 
LSH 1,RO A (seg) WXYZx...x 
PUSHF RO 
POP RO : Treat number as integer 
LSH +20,RO0 ;  Rightt justify 
END LDI 0,R2 
LDI R1,R1 . If number is negative, 
LDILT 80H, R2 ; set sign bit 
ADDI R2,RO0 ; RO = compressed number 
XOR OD5H, RO ; Invert even bits 
: for transmission 
RETS 
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Example 6-4. A-Law Expansion 


BE CONVERTED 


Rl, R2, SP 


(WORST CASE) WORDS: 16 


Invert even bits 

Isolate quantization bin 
Store for bit sign 

Isolate segment code 
Create 1xxxxl 

OR Oxxxxl 

Shift and put result in RO 


Test sign bit 


Negate if a negative number 


* TITLE A-LAW EXPANSION 
* 
* SUBROUTINE AXPND 
* 
* ARGUMENT ASSIGNMENTS: 
* ARGUMENT | FUNCTION 
et 4+------------— 
* RO | NUMBER TO 
* 
* REGISTERS USED AS INPUT: RO 
* REGISTERS MODIFIED: RO, 
* REGISTER CONTAINING RESULT: RO 
* 
* CYCLES: 25 
* 
-global AXPND 
* 
AXPND XOR D5H, RO 
LDI RO,R1 
AND OFH,R1 ; 
LSH 1,R1 
LDI RO,R2 i 
LSH +4,R0 
AND 7,R0 i 
BZ SKIP1 
SUBI 1,R0 
ADDI 32,R1 ; 
SKIP1 ADDI 1,R1 ; 
LSH3  RO,R1,RO ; 
TSTB 80H,R2 ; 
RETSZ 
NEGI RO i 
RETS 
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6.2 FIR, IIR, and Adaptive Filters 


6.2.1 


Figure 6-1. 


FIR Filters 


Digital filters are acommon requirement for DSPs. There are two types of digi- 
tal filters: finite impulse response (FIR) and infinite impulse response (IIR). 
Both of these types can have either fixed or adaptable coefficients. This sec- 
tion presents the fixed-coefficient filters first, followed by the adaptive filters. 


If the FIR filter has an impulse response h [0], h [1],...,h [N — 1], and x [n] repre- 
sents the input of the filter at time n, the output y [n] at time n is given by this 
equation: 


y [n] =h [0] x [n] +h [1] x [nN -1] +... + Hh [N —1] x [n — (N -1)] 


Two features of the ’C3x that facilitate the implementation of the FIR filters are 
parallel multiply/add operations and circular addressing. The former permits 
the performance of a multiplication and an addition in a single machine cycle, 
while the latter makes a finite buffer of length N sufficient for the data x. 


Figure 6—1 shows the arrangement of memory locations necessary to imple- 
ment circular addressing, while Example 6—5 presents the ’C3x assembly 
code for an FIR filter. 


Data Memory Organization for an FIR Filter 


Impulse Initial Final 
Low —esponse input samples input samples 
address h(N - 1) Oldest input } xIn-(N-1)] x(n) 
h(N — 2) x[n — (N—2)] x[n— (N—1)] 
e e e 
e ° ° Circular 
. * ‘ queue 
A(t) x(n—1) x(n — 2) 
High h(0) Newest input x(n) x(n — 1) 


address 


To set up circular addressing, initialize the block-size register BK to block 
length N. Start the locations for signal x from a memory location whose ad- 
dress is a multiple of the smallest power of 2 that is greater than N. For 
instance, if N = 24, the first address for x is a multiple of 32 (the lowest five 
bits of the beginning address are 0). See the Circular Addressing section in the 
Addressing chapter of the TM@S320C3x User’s Guide for more information. 
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In Example 6-5, the pointer to the input sequence x is incremented and is as- 
sumed to be moving from an older input to a newer input. At the end of the sub- 


routine, AR1 points to the position for the next input sample. 


Example 6-5. FIR Filter 


* TITLE FIR FILTER 

* 

* SUBROUTINE FIR 

* 

* EQUATION: y(n) = h(0) * x(n) + h(1) * x(nt1) + 
* + h(NH1) * x(nt(Nt1)) 

* 

* TYPICAL CALLING SEQUENCE: 

* 

* LOAD ARO 

* LOAD  ARL 

* LOAD RC 

* LOAD BK 

* CALL FIR 

* 

* ARGUMENT ASSIGNMENTS: 

* ARGUMENT | FUNCTION 

* + 

* ARO | ADDRESS OF h (N41) 

* ARI | ADDRESS OF x (n-(N+1) ) 

* RC | LENGTH OF FILTER + 2 (N+2) 
* BK | LENGTH OF FILTER (N) 

* 

* REGISTERS USED AS INPUT: ARO, AR1, RC, BK 
* REGISTERS MODIFIED: RO, R2, ARO, AR1, RC 
* REGISTER CONTAINING RESULT: RO 

*x 


* CYCLES: 11 + (Nt1) WORDS: 6 


-global FIR 


x ; Initialize RO: 
FIR MPYF3 *ARO++(1),*AR1++(1)%,RO 
* . h(N41) * x(nt(N41)) +> RO 
LDF On 07 R2 ; Initialize R2 


~ FILTER (1 <= i < N) 
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Example 6-5. FIR Filter (Continued) 


RPTS 
MPYF3 
ADDF3 


ADDF 


RC 7 Set up the repeat cycle 
*ARO++(1),*AR1++(1)%,RO ;  h(NHE14i) *x (nt (Nt1+i) )+>RO 
RO,R2,R2 ; Multiply and add operation 
RO,R2,RO ; Add last product 


.end 


; Return 


6.2.2 


IIR Filters 


The transfer function of the IIR filters has both poles and Os. Its output depends 
on both the input and the past output. As a rule, the IIR filters need less com- 
putation than an FIR with similar frequency response, but the filters have the 
drawback of being sensitive to coefficient quantization. Most often, the IIR fil- 
ters are implemented as a cascade of second-order sections, called biquads. 
Example 6-6 shows the implementation for one biquad. 


This is the equation for a single biquad: 
y[n}] = al y[n-—1]+ a2 y [n-—2]+b0x[n] +1 x [n —-1] + b2 x [n- 2] 


However, the following two equations are more convenient and have smaller 
storage requirements: 


d[n] = a2d[n—2] + al d[n—1]+ x [nl] 
y[n] = b2d[n-2] +b1d [n-—1]+b0d [n] 


Figure 6—2 shows the memory organization for this two-equation approach, 
and Example 6—7 shows the implementation for any number of biquads. 
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Figure 6-2. Data Memory Organization for a Single Biquad 


Filter Newest delay Newest delay 


Low coefficients node values node values 


address Newest delay 


Circular queue 


Oldest delay 


address 


As in the case of FIR filters, the address for the start of the d values must be 
a multiple of 4; that is, the last two bits of the beginning address must be 0. The 
block-size register BK must be initialized to 3. 


Example 6-6. IIR Filter (One Biquad) 


+ + F F F FF F F FF F F OF 


+ + F FF F 


+ FF F F F F 


TITLE IIR FILTER 


SUBROUTINE IIR 1 

IIR1 == IIR FILTER (ONE BIQUAD) 

EQUATIONS: d(n) = a2 * d(nt2) + al * d(nt1) + x(n) 
y(n) = b2 * d(nt2) + bl * d(nt1) + bO * d(n) 

OR y(n) = al*y(ntl) + a2*y(nt2) + b0*x(n) 


+ b1l*x(nt1) + b2*x(nt2) 


TYPICAL CALLING SEQUENCE: 


load R2 

load ARO 

load AR1 

load BK 

CALL TIR1 
ARGUMENT ASSIGNMENTS: 
ARGUMENT FUNCTION 

+ 

R2 | INPUT SAMPLE X(N) 
ARO | ADDRESS OF FILTER COEFFICIENTS (A2) 
AR1 | ADDRESS OF DELAY MODE VALUES (D(N#2) ) 
BK | BK = 3 
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Example 6-6. IIR Filter (One Biquad) (Continued) 
* REGISTERS USED AS INPUT: R2, ARO, BK 
* REGISTERS MODIFIED: RO, Rl, R2, ARO, ARI 
* REGISTER CONTAINING RESULT: RO 
* 
* CYCLES: 11 WORDS: 8 
* 
* FILTER 
* 
-global MIIR1 
* 
IIR1 MPYF3 *ARO,*AR1,RO 
* a2 * d(nt2) +> RO 
MPYF3 *++ARO(1),*ARI--(1) % ,RI1 
b2 * d(nt2) +> R1 
MPYF3 *++ARO(1),*AR1,RO al * d(ntl) +> RO 
ADDF3 RO,R2,R2 a2*d(nt2)+x(n) => R2 
* 
MPYF3 *++ARO(1),*AR1—-—(1)%,RO bl * d(nt1) +> RO 
ADDF3 RO,R2,R2 al*d(n#1)+a2*d(nt2)+x(n) +> R2 
* 
MPYF3 *++ARO(1),R2,R2 bO * d(n) +> R2 
STF R2, *AR1++(1)% 
Store d(n)and point to d(nt1) 
ADDF RO,R2 b1l*d(nt1)+b0*d(n) +> R2 
ADDF R1,R2,RO b2*d (nt2) +b1*d (n+1) 


end 


.end 


+b0*d(n) +> RO 


Return 
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In the more general case, the IIR filter contains N>1 biquads. The equations 
for its implementation are given by the following pseudo-C language code: 


OF. 2 < NP. a: ee) 4 
] = a2 [i] d [i, n - 2] + al [i] d [i,n -1] + y [i - 1,n] 
y [i,n] = b2 [1] d [i - 2] + b1 [1] d [i,n - 1] + bO [i] d [i,n] 


y [In] = y [N - 1,n] 


Figure 6-3 shows the corresponding memory organization, while Example 6—7 
shows the ’C3x assembly-language code. 


Figure 6-3. Data Memory Organization for N Biquads 


Filter Initial delay Final delay 
coefficients node values node values 


Low 
address 


Circular queue 


Circular queue 


address 


You must initialize the block register BK to 3; the beginning of each set of d val- 


ues (that is, d [i,n ],i = 0... N-—1) must be at an address that is a multiple of 
4 (where the last two bits are 0). 


FIR, IIR, and Adaptive Filters 


Example 6-7. IIR Filters (N > 1 Biquads) 


* TITLE IIR FILTERS (N > 1 BIQUADS) 

* SUBROUTINE IIR2 

* 

* EQUATIONS: y(0,n) = x(n) 

* 

* FOR (i = O; i < N; itt) 

* { 
d(i,n) = a2(i) * d(i,nt2) + al(i) * d(i,ntl) * y(itl,n) 
y(i,n) = b2(i) * d(i,nt2) + b1l(i) * d(i,ntl) * bO(i) * d(i,n) 

* TYPICAL CALLING SEQUENCE: 

* } 

* y(n) = y(Nt1,n) 

* 

* TYPICAL CALLING SEQUENCE: 

* 

* load R2 

* load ARO 

bad load AR1L 

* load IRO 

* load IR1 

* load BK 

* load RC 

CALL IIR2 

* 

* ARGUMENT ASSIGNMENT: 

% ARGUMEN FUNCTION 

a 4+-—----—-—-—-—-—-—-—-—-—-—-—-—-——-—-—-——-— — -— — — — -— — — - 

* R2 INPUT SAMPLE x(n) 

* ARO ADDRESS OF FILTER COEFFICIENTS (a2(0)) 

* AR1 ADDRESS OF DELAY NODE VALUES (d(0,nt2) ) 

* BK BK = 3 

* TRO IRO = 4 

* IRI IR1 = 4*N+4 

% RC NUMBER OF BIQUADS (N) +2 

* 

bal REGISTERS USED AS INPUT; R2, ARO, AR1, IRO, IR1, BK, RC 

* REGISTERS MODIFIED; RO, Rl, R2, ARO, AR1, RC 

* REGISTERS CONTAINING RESULT: RO 

* 
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Example 6-7. 


IIR Filters (N > 1 Biquads) (Continued) 


CYCLES: 17 + 6N WORDS: 17 


-global IIR2 


IIR2 MPYF3 


MPYF3 


MPYF3 
ADDF 


MPYF3 
ADDF3 
MPYF3 
STF 


RPTB 


MPYF3 
ADDF3 


MPYF3 


ADDF3 


MPYF3 


ADDF3 


MPYF3 
ADDF3 


STF 


LOOP MPYF3 


*ARO, *ARI1, 


*ARO++(1), 


RO 


*ARI-—— (1) %, 


*++ARO (1), *AR1,RO 


RO, R2, R2 


*++ARO (1), *AR1—-—(1)%,RO 


RO, R2, R2 


*++ARO (1) ,R2 
R2, *AR1—-—(1)% 


LOOP 


R1 


*++ARO (1) ,*++AR1(IRO),RO 


RO,R2,R2 


*++ARO (1), *ARI—— (1) 6R1 


R1,R2,R2 


*++ARO (1), *AR1,RO 


RO,R2,R2 


*++ARO (1), *AR1I—-—(1)%,RO 


RO,R2,R2 


R2, *AR1——(1)% 


*++ARO0 (1), 


FINAL SUMMATION 


R2,R2 


a2(0) * d(0,nt2) +> RO 


b2(0) * d(0,nt2) +> R1 


al(0) * D(0,nt1) +> RO 
First sum term of d(0,n) 


b1(0) * d(0,ntl) +> RO 
Second sum term of d(0,n) 
b0(0) * d(0,n) +> R2 


Store d(0,n) ; 
point to; 
da(0,nt2) 

Loop for 1 <=i‘<n 


a2(i) * d(i,nt2) +> RO 
First sum term of y(itl,n) 


b2(i) * D(i,nt2) +> R1 
Second sum term 


of y(itl,n) 


al(i) * d(i,ntl) +> RO 
First sum of d(i,n) 


b1(i) * d(i,ntl) +> RO 
Second sum term of d(i,n) 
Store d(i,n) ; 


point to d(i,nt2) 


bO(i) * d(i,n) +> R2 
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Example 6-7. IIR Filters (N > 1 Biquads) (Continued) 
ADDF RO, R2 ; First sum term of y (nt1,n) 
ADDF3 R1,R2,R0 H Second sum term 
; of y(nt1,n) 
* 
NOP *ARI——(IR1) : Return to first biquad 
NOP *AR1——(1) % ; Point to d(0,nt1) 
* 
% RETURN SEQUENCE 
* 
RETS ; Return 
* end 
* 
end 


6.2.3 Adaptive Filters (Least Mean Squares Algorithm) 


In some applications in digital signal processing, you must adapt a filter over 
time to keep track of changing conditions. This is accomplished by adapting 
a coefficient to a filter and creating a new coefficient by means of aleast mean 
squares (LMS) algorithm. The equations for this process are described below. 


The book Theory and Design of Adaptive Filters presents the theory of adap- 
tive filters. Although, in theory, both FIR and IIR structures can be used as 
adaptive filters, the stability problems and the local optimum points that the IIR 
filters exhibit make them less attractive for such an application. Hence, until 
further research makes IIR filters a better choice, only the FIR filters are used 
in adaptive algorithms of practical applications. 


In an adaptive FIR filter, the filtering equation takes this form: 
y [n] = h [n,0] x [n] + h [n,1] x [nN — 1] +... +h [n,N — 1] x [n-— (N—- 1)] 


The filter coefficients are time-dependent and updated through LMS algo- 
rithms. In a LMS algorithm, the coefficients are updated by an equation in this 
form: 


h [n+ 1,i] =h [n,i] + Bc[n] x [n-i], i = 0,1,...,N-1 


where c[n] = d[n] — y[n] B is a constant for the computation and d[n] is the de- 
sired signal. You can interleave the updating of the filter coefficients with the 
computation of the filter output so that it takes three cycles per filter tap to do 
both. The updated coefficients are written over the old filter coefficients. 


DSP Algorithms 6-15 


FIR, IIR, and Adaptive Filters 


Example 6-8 shows the implementation of an adaptive FIR filter on the 'C3x. 
The memory organization and the positioning of the data in memory follows 
the same rules that apply to the FIR filter described in section 6.2.1 on page 
6-7. 


Example 6-8. Adaptive FIR Filter (LMS Algorithm) 


LMS == 


LMS ADAPTIVE FILTER 


EQUATIONS: y 


TYPICAL CALLING SEQUENCE: 


e(n) = d(n) - y(n) 
for (i = 0; i < N; i++) 


(n) = h(n,0)*x(n) + h(n,1)*x(n#1) + ...+ h(n,N#1) *x (nt(N41) ) 


h(nt+1,i) = h(n,i) + mu * e(n) * x(nti) 


load R4 
load ARO 
load AR1L 
load AR6 
load RC 
load BK 
CALL FIR 
ARGUMENT ASSIGNMENTS: 
ARGUMENT FUNCTION 
+ 
R4 scale factor (2 * mu * err) 
ARO address of h(n,Nt1) 
AR1 address of x(nt(N+1) ) 
AR6 address of d(n) 
RC length of filter + 2 (Nt1) 
BK length of filter (N) 
REGISTERS USED AS INPUT: R4, ARO, AR1, RC, BK 
REGISTERS MODIFIED: RO, Rl, R2, R5, ARO, AR1, RC 
REGISTER CONTAINING RESULT: RO 
PROGRAM SIZE: 11 words 


EXECUTION CYCLES: 13 + 3N 
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Example 6-8. Adaptive FIR Filter (LMS Algorithm) (Continued) 


: setup (i = 0) 
~text 
LMS 
ldf *aro++, £5 ; Get desired sample 
mpyf3 *ar0--%, *arl++(1)%,r0 ; h(n,N-1) * x (n=—(N=1)) =—> RO 
| | subf £2; 02,2 ; init r2 
* A Initialize RO: 
LMS MPYF3 *ARO, *AR1, RO 
* ; h(n,Ntl) * x(nt(Nt1)) +> RO 
LDF 0.0,R2 ; Initialize R2 
; Initialize RI: 
MPYF3 *AR1++(1)%, R4, R1 ; “«(nmt(Nt1)) * tmuerr +> R1 


ADDF3 *ARO++(1), Rl, R1 
; h(n,Nt1) + x(nt(Nt1)) * 
; tmuerr +> R1 


* FILTER AND UPDATE (1 <= I < N) 
* 


RPTB LOOP : Set up the repeat block 
; Filter: 
MPYF3 *ARO-—(1),*AR1,RO : h (n, Nt1+i) 
; * x(nt(Nt1iti)) + RO 
| | ADDF3 RO,R2,R2 7 Multiply and add operation 
* 
* ; UPDATE: 
MPYF3 *AR1++(1)%,R4,R1 ;  «(n,NE(NE1ti)) * tmuerr +> R1 
|| STF R1, *ARO++ (1) ; Ril +> h(n+1,Nt1+(it1) ) 
* 
LOOP ADDF3 *ARO++(1), Rl, R1 
* ; h(n,Nt1+ti) + x(nt(Nt1+i) ) 
F xtmuerr +> R1 
* 
ADDF3 RO,R2,RO0 ; Add last product 
STF R1, *tARO (1) ;  h(n,0) + x(n) 
; * tmuerr +> h(n+1,0) 
* 
* RETURN SEQUENCE 
* 
RETS ; Return 
* 
end 
end 
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6.3 Lattice Filters 


The lattice form is an alternative way of implementing digital filters. It has found 
applications in speech processing, spectral estimation, and other areas. In this 
discussion, the notation and terminology from speech processing applications 
are used. 


If H(z) is the transfer function of a digital filter that has only poles, A(z) = 1/H(z) 
is a filter having only Os, and is called the inverse filter. The inverse lattice filter 
is shown in Figure 6—4. These equations describe the filter in mathematical 
terms: 


f (in) = f (i—1,n) + k (i) b (i—1,n-1) 
b (in) = b (i— 1,n— 1) +k (i) f (i—1,n) 
Initial conditions: 

f (O,n) = b (0,n) = x (n) 

Final conditions: 

y (n) =f (p,n) 


In the above equation, f (i,n) is the forward error, b (i,n) is the backward error, 
k (i) is the i-th reflection coefficient, x (n) is the input, and y (n) is the output 
signal. The order of the filter (that is, the number of stages) is p. In the linear 
predictive coding (LPC) method of speech processing, the inverse lattice filter 
is used during analysis, and the (forward) lattice filter during speech synthesis. 


Figure 6—4. Structure of the Inverse Lattice Filter 


x(n) 


f(p -1,n 


f(0, n) f(1, n) va ) f(p,n) = y(n) 
> D> 44 > > 
K1 K2 Kp 
K1 K2 Kp 
b(0, n) b(1, n) b(p—1, n) 


Figure 6—5 shows the data memory organization of the inverse lattice filter on 
the ’C3x. 


Figure 6-5. Data Memory Organization for Forward and Inverse Lattice Filters 


Reflection 
coefficients 


address (P) 


Example 6-9 shows the implementation of an inverse lattice filter. 


Example 6-9. Inverse Lattice Filter 


propagation terms 


Low 
address k(1) b(0, n—1) 
k(2) b(1, n-1) 
e e 


b(p -1, n—-1) 
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* TITLE INVERSE LATTICE FILTER 
* 

* SUBROUTINE LATINV 

* 

*  LATINV == LATTICE FILTER (LPC INVERSE FILTER + ANALYSIS) 
* 

* TYPICAL CALLING SEQUENCE 

* 

~ load R2 

* load ARO 

* load AR1 

hd load RC 

* CALL LATINV 

* 

* ARGUMEN ASSIGNMENTS: 

* ARGUMEN FUNCTION 

* + 

* R2 £(0,n) = x(n) 

* ARO ADDRESS OF FILTER COEFFICIENTS 
* ARI ADDRESS OF BACKWARD PROPAGATION 
* VALUES (b(0,n#1) ) 

* RC RC =pt2 

* 


REGISTERS USED AS INPUT: R2, ARO, AR1, RC 


* REGISTERS MODIFIED: RO, R1, R2, R3, RS, RE, 


REGISTER CONTAINING RESULT: R2 (f(p,n)) 
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Example 6-9. Inverse Lattice Filter (Continued) 

* 

* PROGRAM SIZE: 10 WORDS 

* 

* EXECUTION CYCLES: 13 + 3 * (pl) 

* 
.global LATINV 

* 

ce aaah 

* 

LATINV MPYF3 *ARO, *AR1, RO 

* ; k(1) * b(0,nt1) +> RO 

* ; Assume £(0,n) +> R2. 
LDF R2,R3 ; Put b(0,n) = £(0,n) +> R3. 
MPYF3 *ARO++(1),R2,R1 

7 7 kl) * £(0,n) + RI 

* 

* 2 <= i <=p 

* 
RPTB LOOP 
MPYF3 *ARO,*++AR1(1),RO >; k(i) * b(itl,nt1) + RO 

| | ADDF3 R2,R0,R2 >; £ (i#141,n) +k (141) 

* > *b(itl+t1,nt1) 

* ; = £(itl,n) +> R2 

* 

* ;  b(itt1,b41) +k (it1) *£ (it1+1,n) 
ADDF3 *tAR1(1), Rl, R3 ; = b(itl,n) +> R3 

| | STF R3, *tAR1 (1) > b(ititi,n) +> b(itl+1,nt1) 

* 

LOOP MPYF3 *ARO++(1),R2,R1 

* ; k(i) * £(it1,n) +> RI 

* 

* I = P+1 (CLEANUP) 
ADDF3 R2,R0,R2 ;  £(ptl,n)+k(p) *b(ptl,nt1) 

? ; = f£(p,n) +> R2 

* 

* > b(ptl,ntl)+k(p) *f (ptl,n) 
ADDF3 *AR1, R1, R3 + = b(p,n) +> R3 

| | STF R3, *AR1 ; b(ptl,n) +> b(pt1,nt1) 

* 

* RETURN SEQUENCE 

* 
RETS ; RETURN 

* 

iad end 

* 

end 
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The forward lattice filter is similar in structure to the inverse filter, as shown in 
Figure 6-6. 


Figure 6-6. Structure of the (Forward) Lattice Filter 


f(p—1, n) f(2, n) 
p—}( > 
Fe CF. 
b(2, n) 


These corresponding equations describe the lattice filter: 


f (i—1,n) =f (in) —k (i) b (i- 1,n- 1) 
b (in) = b (i—1,n— 1) +k (i) f (i— 1,n) 


Initial conditions: 

f(p,n) =x (n),b(iin-—1)=0 fori=1,...,p 
Final conditions: 

y (n) = f (0,n) 


The data memory organization is identical to that of the inverse filter, as shown 
in Figure 6—5 on page 6-19. Example 6—10 shows the implementation of the 
lattice filter on the ’C3x. 
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Example 6-10. Lattice Filter 


* TITLE LATTICE FILTER 
* 
* SUBROUTINE LATICE 
* 
* LOAD ARO 
* LOAD ARI 
* LOAD RC 
* CALL LATICE 
* 
* ARGUMENT ASSIGNMENTS: 
* ARGUMENT FUNCTION 
a ee 4+--------—--—-—-—--—-—-—- -— -— -— - - - - - - 
* R2 F(P,N) = E(N) = EXCITATION 
* ARO ADDRESS OF FILTER COEFFICIENTS (K(P)) 
* ARI ADDRESS OF BACKWARD PROPAGATION VALUES (B(P+#1,N#1) ) 
IRO 3 
* RC RC =P +t 3 
* 
* REGISTERS USED AS INPUT: R2, ARO, AR1, RC 
* REGISTERS MODIFIED: RO, Rl, R2, R3, RS, RE, RC, ARO, ARI 
* REGISTER CONTAINING RESULT: R2 (f£(0,n)) 
* 
* STACK USAGE: NONE 
* 
* PROGRAM SIZE: 12 WORDS 
* 
* EXECUTION CYCLES: 15 + 3 * (P42) 
* 
-global LATICE 
* 
LATICE MPYF3 *ARO, *AR1,RO 
* ; K(P) * B(Pt1,N41) +> RO 
; Assume F(P,N) +> R2 
SUBF3 RO,R2,R2 ; F(P,N)tK(P) *B(P41,Nt1) 
; = F(P#1,N) +> R2 
| | MPYF3 *--ARO(1),*--AR1(1),RO0 
;  K(P-1) * B(Pt2,N41) +> RO 
SUBF3 RO,R2,R2 ; FE (P-1,N)+K(P-1) *B(P2,N41) 
; = F(P#2,N) +> R2 
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Lattice Filters 


RETURN SEQUENCE 


END 


.end 


| | MPYF3 *-—-ARO(1),*--AR1(1),RO 
; K(P-2) * B(P-3,N-1) +> RO 
MPYF3 R2,*+ARO(1),R1 |; 4F(P-2,N) * K(P-1) +> RI 
ADDF3 R1,*+AR1(1),R3  ; F(P+2,N) * K(P-1) + B(P+2,N-1) 
i = B(P-1,N) + R3 
; 1 <= I <= P-2 
* 
RPTB LOOP 
SUBF3 RO,R2,R2 ; F(I,N) - K(I) * B(I-1,N-1) 
; = F(I-1,N) +> R2 
| | MPYF3 *—-—-ARO(1),*-—-AR1(1),RO 
; K(I-1) * B(It2,N+1) +> RO 
STF R3, *+AR1 (IRO) ; B(I+1,N) +> B(I+1,N-1) 
| | MPYF3 R2,*+ARO(1),Rl1 ; 4F(I-1,N) * K(I) + RI 
LOOP ADDF3 R1,*+AR1(1),R3  ; #£42x.>F(I-1,N) * K(I) + B(I-1,N-1) 
; = B(I,N) +> R83 
STF R3, *+AR1 (2) ;  B(1,N) +> B(1,Nt1) 
STF R2,*+AR1 (1) ;  F(0,N) +> B(0,N#1) 
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6.4 Matrix-Vector 


Multiplication 


In matrix-vector multiplication, a K x N matrix of elements m(i,j) having K rows 
and N columns is multiplied by an N x 1 vector to produce a K x 1 result. The 
multiplier vector has elements v(j), and the product vector has elements p(i). 
Each one of the product-vector elements is computed by the following expres- 
sion: 


p (i) =m (i,0) v (0) + m (i,1) v (1) +... 4m (JN—1) v(N—1) i= 0,1,..,.K—1 


This is essentially a dot product, and the matrix-vector multiplication contains, 
as a special case, the dot product presented in Example 2-1 on page 2-3. In 
pseudo-C format, the computation of the matrix multiplication is expressed by: 


for (i = O;i< K; i++) { 
p (i) =0 
for (j = 0;j<N;j+-+) 
p (i) =p (i) +m (ij) * v (/) 
} 


Figure 6—7 shows the data memory organization for matrix-vector multiplica- 
tion, and Example 6-11 shows the ’'C3x assembly code that implements it. 
Note that in Example 6-11, K (number of rows) must be greater than 0 and N 
(number of columns) must be greater than 1. 


Figure 6—7. Data Memory Organization for Matrix-Vector Multiplication 
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Example 6—11. Matrix Times a Vector Multiplication 


Matrix-Vector Multiplication 


* 
* TITLE MATRIX TIMES A VECTOR MULTIPLICATION 
* 
* SUBROUTINE MAT 
* MAT == MATRIX TIMES A VECTOR OPERATION 
* 
* TYPICAL CALLING SEQUENCE: * 
* load ARO 
* load ARI 
* load = AR2 
* load = AR3 
* load R1 
* CALL MAT 
* 
* ARGUMENT ASSIGNMENTS: 
* ARGUMENT FUNCTION 
* + 
* ARO ADDRESS OF M(0,0) 
* ARI ADDRESS OF V(0) 
*  BAR2 ADDRESS OF P(0) 
*  AR3 NUMBER OF ROWS + 1 (K+t1) 
* RI NUMBER OF COLUMNS + 2 (N42) 
* 
* REGISTERS USED AS INPUT: ARO, AR1, AR2, AR3, R1 
* REGISTERS MODIFIED: RO, R2, ARO, AR1, AR2, AR3, IRO, 
- RC, RSA, REA 
* 
* PROGRAM SIZE: 11 
* 
* EXECUTION CYCLES: 6 + 10 * K + K * (N #1) 
* 
-global MAT 

SETUP 
* 
MAT LDI R1, IRO ; Number of columnst2 +> IRO 

ADDI 2,I1RO ; IRO=N 
FOR (i = 0; i < K; i++) LOOP OVER THE ROWS 
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Example 6-11. Matrix Times a Vector Multiplication (Continued) 


ROWS LDF 0.0,R2 ; Initialize R2 

MPYF3 *ARO++(1),*AR1++(1),RO 
* 7; m(i,0) * v(0) +> RO 
* 
ial FOR (3 = 1; Jj < N; j++) DO DOT PRODUCT OVER COLUMNS 
* 

RPTS R1 ; Multiply a row by a column 
* 

MPYF3 *ARO++(1),*AR1++(1),RO ; m(i,j) * v(j) => RO 
| | ADDF3 RO0,R2,R2 > m(i,jt1) * v(5t1) + R2 +> R2 
* 

DBD AR3, ROWS ; Counts the no. of rows left 
* 

ADDF RO,R2 . Last accumulate 

STF R2, *AR2++ (1) ; Result +> p(i) 

NOP *——AR1 (IRO) ; Set AR1 to point to v(0) 
* !!! DELAYED BRANCH HAPPENS HERE !!! 
* 
* RETURN SEQUENCE 
* 

RETS H Return 
* end 
* 

end 


6.5 
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In vector maximum search, a vector of N elements is searched for its greatest 
element: 


max { p(i) } 
In pseudo-C format, the search is expressed by: 


max = 0 
max location = 0 
for ( i=0; i < N; itt) } 
if ( max < p [i]} 
max = p[il; 
max location = i; 


} 


Example 6—12 shows an example. 


Example 6-12. vecmax.asm 


Vector Maximum Search 


’ 


Vector maximum 


EQUATIONS: max 


search 


= max {p(i) } 


TYPICAL CALLING SEQUENCE: 


load ARO 
load RC 
load R1 


CALL vecmax 


ARGUMENT ASSIGNMENTS: 


argument | function 

———— — — 4+-——--—--—-—-—--—-—-—--—------------ - 
ARO address of vector 

RC length of filter + 2 (Nt2) 

R1 | length of filter - 1 (N-1) 
REGISTERS USED AS INPUT: ARO, R1, RC 

REGISTERS MODIFIED: RO, Rl, ARO, RC 


REGISTER CONTAINING RESULT: 


‘J 


EXECUTION CYCLI 


RO maximum value 
Rl index of maximum value 


ROGRAM SIZE: 5 words 


ES: 2 + 3N 


«text 


vecmax ldf 


rptb 
cmpf3 
ldile 


loop ldfle 


end 


*ar0--,xr0 
loop 
*ar0,r0 
re; rl 


*ar0--,xr0 


; last value 


; Compare input value to maximum 
; Write index of loop 


; Load new max value 
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6.6 Fast Fourier Transforms (FFTs) 
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Fourier transforms are an important tool often used in digital signal processing 
(DSP) systems. The purpose of the transform is to convert information from 
the time domain to the frequency domain. The inverse Fourier transform con- 
verts information back to the time domain from the frequency domain. Imple- 
mentation of Fourier transforms that are computationally efficient are known 
as fast Fourier transforms (FFTs). The theory of FFTs can be found in books 
such as DFT/FFT and Convolution Algorithms, and Digital Signal Processing 
Applications With the TMS320 Family. 


Fast Fourier transform is a label for a collection of algorithms that implement 
efficient conversion from time to frequency domain. Distinctions are made 
among FFTs based on the following characteristics: 


[} Radix-2 or radix-4 algorithms (depending on the size of the FFT butterfly) 
_j Decimation in time or frequency (DIT or DIF) 

[1 Complex or real FFTs 

Lj) FFT length, etc. 


Certain ’C3x features that increase the efficiency of numerically intensive algo- 
rithms are particularly well suited for FFTs. The high speed of the device (33-ns 
cycle time) makes implementation of real-time algorithms easier, while float- 
ing-point capability eliminates the problems associated with dynamic range. 
The powerful indirect-addressing indexing scheme facilitates the access of 
FFT butterfly legs with different soans. The repeat block implemented by the 
RPTB instruction reduces the looping overhead in algorithms heavily depen- 
dent on loops (such as FFTs). This construct provides the efficiency of in-line 
coding in loop form. The FFT reverses the bit order of the output; therefore, 
the output must be reordered. This reordering does not require extra cycles, 
because the device has a special mode of indirect addressing (bit-reversed 
addressing) for accessing the FFT output in the original order. 


The examples in this section are based on programs containedin the DFT/FFT 
and Convolution Algorithms book and in the paper Real-Valued Fast Fourier 
Transform Algorithms. 


Fast Fourier Transforms (FFTs) 


6.6.1 FFT Definition 
The FFT is an efficient implementation of the discrete fourier transform (DFT) 
equation: 


N-1 


Xk) = >) x(n) eH 


n=0 


The inverse DFT equation is: 


N-1 
xy) = 3 >, Xx(00) 7x 
k=0 


The FFT takes advantage of the periodic nature of the complex exponential 
ei? to reduce redundancy and number of calculations. The FFT expresses the 
original DFT using two smaller DFTs of length s This definition is applied until 
the original DFT has been expressed in terms of a 2-point DFT, which is nor- 
mally referred to as radix-2 FFT. 

There are two ways this decomposition process occurs: 


_j By decimation in time where the signals are split into several shorter inter- 
leaved sequences (see Figure 6-8). 


.) By decimation in frequency where the signals are split into several smaller 
interleaved frequency components (see Figure 6-9). 


Figure 6—8. Decimation in Time for an 8-Point FFT 


Stage 3 Stage 2 Stage 1 
x(0) @ ® X(0) 
x(4) © ® X(1) 
x(2) © ® X(2) 
x(6) © ® X(3) 
x(1) @ @ X(4) 
x(5) © ® X(5) 
x(3) © ® X(6) 
x(7)  @ ® X(7) 
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Figure 6—9. Decimation in Frequency for 8-Point FFT 


Stage 1 Stage 2 Stage 3 


6.6.2 Complex Radix-2 DIF FFT 
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Example 6-13 and Example 6—14 show the implementation of a complex 
radix-2 DIF FFT on the ’C3x. Example 6-13 contains the generic code of the 
FFT, which can be used with a FFT of any length. However, for the complete 
implementation of an FFT, you need a table of twiddle factors (sines/cosines); 
the length of the table depends on the size of the transform. A table with twiddle 
factors (containing 1-1/4 complete cycles of a sine) is presented separately in 
Example 6-14 as a 64-point FFT. This retains the generic form of the radix-2 
DIF FFT in Example 6-13. A full sine wave must have an equal number of 
samples as the length of the FFT. Example 6—14 uses two variables: N, which 
is the FFT length, and M, which is the logarithm of N to a base equal to the 
radix. In other words, M is the number of stages of the FFT. For example, in 
a 64-point FFT, M = 6 when using a radix-2 algorithm, and M = 3 when using 
a radix-4 algorithm. If the table with the twiddle factors and the FFT code are 
kept in separate files, they will be connected at link time. 
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Example 6-13. Complex Radix-2 DIF FFT 


* 
* TITLE COMPLEX, RADIX-2, DIF FFT 
* 
* GENERIC PROGRAM FOR LOOPEDECODE RADIX+2 FFT COMPUTATION IN TMS320C3x 
* 
* THE PROGRAM IS TAKEN FROM THE BURRUS AND PARKS BOOK, P. 111. 
* THE (COMPLEX) DATA RESIDE IN INTERNAL MEMORY. THE COMPUTATION 
* IS DONE IN PLACE, BUT THE RESULT IS MOVED TO ANOTHER MEMORY 
* SECTION TO DEMONSTRATE THE BITtREVERSED ADDRESSING. 
* 
* THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE THAT IS PUT IN A .DATA 
* SECTION. THIS DATA IS INCLUDED IN A SEPARATE FILE TO PRESERVE THE 
* GENERIC NATURE OF THE PROGRAM. FOR THE SAME PURPOSE, THE SIZE OF 
* THE FFIN AND LOG2(N) ARE DEFINED IN A .GLOBL DIRECTIVE AND SPECIFIED 
* DURING LINKING. 
* 
* 

-globl FET ; Entry point for execution 

-globl N H FFT size 

-globl M : LOG2 (N) 

-globl SINE : Address of sine table 
INP -usect “IN”,1024 : Memory with input data 
.BSS OUTP,1024 Hi Memory with output data 

»text 


x INITIALIZE 


FFTSIZ .word N 
LOGFFT .word M 
SINTAB .word SINE 
INPUT .word INP 
OUTPUT .word OUTP 


FET: LDP FFTSIZ 7 Command to load data page pointer 
LDI @FFTSIZ,IR1 
LSH +2,IR1 ; IR1l = N/4, pointer for SIN/COS table 
LDI 0,AR6 H AR6 holds the current stage number 
LDI @FFTSIZ,IRO 
LSH 1, IRO . TRO = 2*N1 (because of real/imag) 
LDI @FFTSIZ,R7 ; R7 = N2 
LDI 1,AR7 ; Initialize repeat counter 

; of first loop 
LDI 1,AR5 ; Initialize IE index (AR5 = IE) 


DSP Algorithms 6-31 


Fast Fourier Transforms (FFTs) 


Example 6-13. Complex Radix-2 DIF FFT (Continued) 


INLOP: 


OUTER LOOP 


NOP 
LDI 
ADDI 
LDI 
SUBI 


FIRST LOOP 


RPTB 
ADDF 
SUBF 
ADDF 
SUBF 
STF 
[| STF 
BLK1 STF 
[| sTF 


IF THIS IS THE LAST STAGE, 


CMP I 
BZD 


*++AR6 (1) 
@INPUT, ARO 
R7, ARO, AR2 
AR7,RC 
1,RC 


BLK1 
*ARO, *AR2,RO 


*AR2++, *ARO++,R1 


*AR2,*ARO, R2 
*AR2,*ARO,R3 
R2, *ARO-- 

R3, *AR2-- 

RO, *ARO++ (IRO) 
R1, *AR2++ (IRO) 


@LOGFFT, AR6 
END 


MAIN INNER LOOP 


SECOND LOOP 


2,AR1 


@SINTAB, AR4 
AR5,AR4 


AR1,ARO 
2,AR1 
@INPUT, ARO 
R7, ARO, AR2 
AR7,RC 
1,RC 


*AR4,R6 


BLK2 


*AR2,*ARO, R2 
*+AR2,*+ARO,R1 


R2,R6,RO 


*+AR2,*+ARO,R3 


R1, *+AR4 (IR1),R3 


R3, *+ARO 
RO,R3,R4 
R1,R6,RO 


a 


, 


a 


YOU ARE DONE 


Current FFT stage 
ARO points to X(1I) 
AR2 points to X(L) 


RC should be one less than desired # 


RO X (1) +X (L) 

Rl = X(I)+X(L) 

R2 = Y(I)+Y(L) 

R3: = ‘Y¥ (1)#Y (L) 

Y(I) = R2 and... 

Y(L) = R3 

X(I) = RO and... 

X(L) = Rl and ARO,2 = ARO,2 + 2*n 


A Init loop counter for 

- inner loop 

; Initialize IA index (AR4 = IA) 
: IA = IA+IE; AR4 points to 

7 cosine 


: Increment inner loop counter 
; (X(I),Y(I)) pointer 
; (X(L),Y(L)) pointer 


; RC should be 1 less than 
: desired # 
; R6 = SIN 


; R2 = X(1I)+X(L) 


; Ri = Y¥(1)#Y(L) 
; RO = R2*SIN and... 


; R3 = Y(I)+Y(L) 

p R3 = R1*COS and 

; Y(I) = Y¥(I)+¥(L) 

; R4 = R1 * COSHR2 * SIN 
; RO = RL * SIN and... 
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Example 6—13. Complex Radix-2 DIF FFT (Continued) 


| | ADDF *AR2,*ARO,R3 ; R3 = X(I) + X(L) 
MPYF R2,*+AR4(IR1),R3 ; R3 = R2* COS and... 
| | STF R3, *ARO++ (IRO) 
* ;  &(I) = X(I)+X(L) and ARO = ARO+2*N1 
ADDF RO,R3,R5 ; R5 = R2*COS+R1*SIN 
BLK2 STF R5, *AR2++ (IRO) 7 X(L) = R2 * COS+R1 * SIN, 
: incr AR2 and... 
| | STF R4, *+AR2 7; Y(L) = R1*cOS+R2*SIN 
CMP I R7,AR1 
BNE INLOP ; Loop back to the inner loop 
LSH 1,AR7 ; Increment loop counter for next time 
BRD LOOP : Next FFT stage (delayed) 
LSH 1,AR5 ; IE = 2*IE 
LDI R7, IRO ; Nl = N2 
LSH +1,R7 ; N2 = N2/2 
i STORE RESULT OUT USING BIT-REVERSED ADDRESSING 
END: LDI @FFTSIZ,RC i: RC = N 
SUBI iy RC ; RC should be one less than desired # 
LDI @FFTSIZ,IRO A IRO = size of FFT =N 
LDI 2,IR1 
LDI @INPUT, ARO 
LDI @OUTPUT, AR1 
RPTB BITRV 
LDF *+ARO(1),RO 
| | LDF *ARO++(IRO)B,R1 
BITRV STF RO, *+AR1 (1) 
| | STF R1, *AR1++ (IR1) 
SELF BR SELF ; Branch to itself at the end 
end 
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Example 6—14. Table With Twiddle Factors for a 64-Point FFT 


E WITH TWIDDLE FACTORS FOR A 644POINT FFT 


*TITLE TABL 

* 

* FILE TO BE LINKED WITH THE SOURCE CODE 
-globl SINE 
-glob1 N 
-globl M 

N .set 64 

M .set 6 
-data 

SINE 
. float 0.000000 
. float 0.098017 
. float 0.195090 
. float 0.290285 
. float 0.382683 
- float 0.471397 
. float 0.555570 
. float 0.634393 
- float 0.707107 
. float 0.773010 
. float 0.831470 
. float 0.881921 
. float 0.923880 
. float 0.956940 
. float 0.980785 
. float 0.995185 

COSINE 
. float 1.000000 
. float 0.995185 
. float 0.980785 
. float 0.956940 
. float 0.923880 
. float 0.881921 
. float 0.831470 
. float 0.773010 
sftloat 0.707107 
. float 0.634393 
. float 0.555570 
- float 0.471397 
. float 0.382683 
. float 0.290285 
. float 0.195090 
. float 0.098017 


FOR A 64-POINT, 


RADIX+2 FFT * 
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Example 6—14. Table With Twiddle Factors for a 64-Point FFT (Continued) 


Hh Hh Fh Fh 


Fh Fh Fh 


Fh FH Fh Fh Fh Fh Fh Fh 


Fh Fh Fh 


Fh FH Fh Fh Fh Fh Fh Fh 


Fh Fh Fh 


Fh Fh Fh 


rh th 


rh th 


rh th 


Hh Hh Hh Hh Hh Eh Fh Eh Eh Eh 


loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
loat 
Loat 
loat 
loat 


I+ I+ + I+ 


-000000 
-098017 
- 195090 
-290285 
. 382683 
-471397 
—0.555570 
- 634393 
- 707107 
- 773010 
-831470 
-881921 
- 923880 
- 956940 
- 980785 
»995185 
—1.000000 
- 995185 
- 980785 
- 956940 
- 923880 
-881921 
-831470 
- 773010 
. 707107 
- 634393 
-555570 
-471397 
- 382683 
-290285 
- 195090 
.098017 
-000000 
.098017 
. 195090 
-290285 
- 382683 
-471397 
-555570 
- 634393 
- 707107 
- 773010 
- 831470 
-881921 
- 923880 
- 956940 
- 980785 
-995185 


oO oo oo 


ooOoCoCOCOCOOCOC 0 
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6.6.3 Complex Radix-4 DIF FFT 


The radix-2 algorithm has tutorial value because the functioning of the FFT 
algorithm is relatively easy to understand. However, radix-4 implementation 
can increase execution speed by reducing the amount of arithmetic required. 
Example 6-15 shows the generic implementation of a complex DIF FFT in 
radix-4. A companion table, such as the one in Example 6-14, must have a 
value of M equal to the logN, where the base of the logarithm is 4. 


Example 6—15. Complex Radix-4 DIF FFT 


NERIC PROGRAM TO PERFORM A LOOPED+CODE RADIX+t4 FFT COMPUTATION 


PROGRAM IS TAKEN FROM THE BURRUS AND PARKS BOOK, P. 117. 


, RADIX-4, DIF FFT 


FACTORS ARE SUPPLIED IN A TABLE THAT IS PUT IN A .DATA 


DATA RESIDE IN INTERNAL MEMORY, AND THE COMPUTATION 


DATA IS INCLUDED IN A SEPARATE FILE TO PRESERVE THE 


MI 


E FFT N AND 
ECIFIED DURING LINKING. 


TITLE COMPLEX 

GE 

IN THE TMS320C3x 
HE (COMPLEX) 

IS DONE IN PLACE. 
KE TWIDDLE 

SECTION. THIS 

GENERIC NATUR 


E OF THE PROGRAM. FOR THE SAME PURPOSE, THE SIZE OF 


LOG4 (N) ARE DEFINED IN A .GLOBL DIRECTIV. 


AND 


ORDER TO 
DDLE BRANC 


ST 
Pee 


+ F£ FF FF F F FF F F F FF F F F F HF FH 
I 


+ 


TEMP 
STORE 


ORAGE. NOTE 


117 OF THE 


-globl 
-globl 
-globl 
-globl 


susect 


~text 


bs INITIALIZ! 


.word 
.word 
.word 
.word 
.word 
.word 


1 | 


HAVE THE FINAL RESULT IN BIT+REVERSED ORDER, THE TWO 
HES OF THE RADIX+4 BUTTERFLY ARE INTERCHANGED DURING 


FFT ; Entry point for execution 
N ; FFT size 

M ; LOG4 (N) 

SINE ; Address of sine table 


THIS DIFFERENCE WHEN COMPARING WITH THE PROGRAM IN 
BURRUS AND PARKS BOOK. 


“IN’,1024 ; Memory with input data 


$+2 
FFTSIZ ; Beginning of temp storage area 
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Example 6—15. Complex Radix-4 DIF FFT (Continued) 


any 


any 


.BSS FFTSIZ,1 
»BSS LOGFFT,1 
.BSS SINTAB, 1 
.BSS INPUT, 1 
.BSS STAGE, 1 
.BSS RPTCNT, 1 
-BSS TEINDX,1 
»BSS LPCNT,1 
-BSS JT, 1 
«BSS IA1,1 
FFT: 
* INITIALIZE DATA LOCATIONS 

DP TEMP 

DI @TEMP, ARO 

DI @STORE, AR1 

DI *ARO++, RO 

TI RO, *AR1++ 

DI *ARO++, RO 

TI RO, *AR1++ 

DI *ARO++, RO 

TI RO, *AR1++ 

DI *ARO, RO 

TI RO, *AR1 

DP FFTSIZ 

DI @FFTSIZ,RO 

DI @FFTSIZ, IRO 

DI @FFTSIZ,IR1 

DI 0,AR7 

TI AR7, @STAGE 

SH 1, 1IR0 

SH +2,IR1 

DI 1,AR7 

TI AR7, @RPTCNT 

TI AR7, @IEINDX 

SH +2,RO 

DDI 2,R0 

TI RO, @JT 

UBI 2,R0 

SH 1,R0 

UTER LOOP 

DI @INPUT, ARO 

DDI RO, ARO, AR1 

DDI RO, AR1,AR2 

DDI RO, AR2,AR3 

DI @RPTCNT, RC 

UBI 1,RC 


r 


FFT size 

LOG4 (FFTSIZ) 
Sine/cosine table base 
Area with input data to process 
FFT stage # 

Repeat counter 

IE index for sine/cosine 
Secondtloop count 

JT counter in program, P. 
IA1 index in program, P. 


117 
qa 


Command to load data page counter 


Xfer data from one memory to the other 


Command to load data page pointer 


@STAGE holds the current stage number 
IRO = 2*N1 (because of real/imag) 
IR1 = N/4, pointer for SIN/COS table 


Init repeat counter of first loop 


Init. IE index 
JT = RO/2+2 
RO = N2 


ARO points to 
AR1 points to 
AR2 points to 
AR3 points to 


RC should be one less than desired # 
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Example 6-15. Complex Radix-4 DIF FFT (Continued) 


* FIRST LOOP 


RPTB BLK1 
ADDF *+ARO, *+AR2,R1 
* 
ADDF *+AR3, *+AR1,R3 
* 
ADDF R3,R1,R6 
SUBF *+AR2, *+ARO,R4 
* 
STF R6, *+ARO 
SUBF R3,R1 
LDF *AR2,R5 
LDF *+AR1,R7 
ADDF *AR3, *AR1,R3 
ADDF R5,*ARO,R1 
STF R1,*+AR1 
ADDF R3,R1,R6 
SUBF R5, *ARO, R2 
STF R6, *ARO++ (IRO) 
SUBF R3,R1 
SUBF *AR3, *AR1,R6 
SUBF R7, *+AR3,R3 
STF R1, *AR1++(IRO) 
SUBF R6,R4,R5 
ADDF R6,R4 
STF R5, *+AR2 
STF R4,*+AR3 
SUBF R3,R2,R5 
ADDF R3,R2 
BLK1 STF R5, *AR2++(IRO) 
STF R2, *AR3++ (IRO) 
* IF THIS IS THE LAST STAGI 
LDI @STAGE, AR7 
ADDI 1,AR7 
CMPI @LOGFFT, AR7 
BZD END 
STI AR7, @STAGE 
* MAIN INNER LOOP 
LDI 1,AR7 
STI AR7, @IA1 
LDI 2,AR7 
STI AR7, @LPCNT 
LDI 2,AR6 
ADDI @LPCNT, AR6 
LDI @LPCNT, ARO 
LDI @IA1,AR7 


s 


* Rl = Y(I)+Y(1I2) 
; R383 = Y(I1)+Y(I3) 
; R6 = R1+R3 

; R4 = Y(1I)+y¥(12) 
; Y(I) = R1+R3 

; Rl = R1=R3 

; RS = X(12) 

; R7 = Y(I1) 

; R383 = X(1I1)+X(I3) 
; Rl = X(I)+X(I2) 
; Y(I1) = R1tR3 

; R6 = R1+R3 

; R2 = X(1I)#X(I2) 
; X(I) = R1+R3 

; Rl = R1=R3 

; R6 = X(I1)+X(I3) 
; R3 = Y(I1)+#Y(I3) 
; X(I1) = R1ItR3 

; RS = R4tR6 

; R4 = R4+R6 

; Y(I2) = R4tR6 

+ Y(I3) = R4+R6 

; RS = R2=R3 

;  R2 = R2+R3 

; X(I2) = R2tR3 

; &X(I3) = R2+R3 
YOU ARE DONE 


7 Current FFT stage 


A Init IA1l index 


H Init loop counter for inner loop 
; INLOP : 
; Increment inner loop counter 


6-38 


Fast Fourier Transforms (FFTs) 


Example 6—15. Complex Radix-4 DIF FFT (Continued) 


ADDI @IEINDX,AR7 
ADDI @INPUT,ARO 
STI ART, @IA1 
ADDI R0O,ARO,AR1 
STI AR6, @LPCNT 
ADDI R0O,AR1,AR2 
ADDI R0O,AR2,AR3 
LDI @RPTCNT, RC 
SUBI 1,RC 

CMPI @JT,AR6 

BZD SPCL 

LDI @IA1,AR7 
LDI @IA1,AR4 
ADDI @SINTAB, AR4 
SUBI 1,AR4 

ADDI AR4,AR7,AR5 
SUBI 1,AR5 

ADDI AR7,AR5,AR6 
SUBI 1,AR6 


* SECOND LOOP 


RPTB BLK2 

ADDF *+AR2,*+AR0,R3 
* 

ADDF *+AR3,*+AR1,R5 
* 

ADDF R5,R3,R6 

SUBF *+AR2,*+ARO,R4 
* 

SUBF  R5,R3 

ADDF *AR2,*ARO,RI1 

ADDF *AR3,*AR1,R5 

MPYF R3,*+AR5(IR1),R6 
| | STF R6, *+ARO 

ADDF R5,R1,R7 

SUBF *AR2,*ARO,R2 

SUBF R5,R1 

MPYF R1,*AR5,R7 
| | STF R7, *ARO++ (IRO) 

SUBF R7,R6 

SUBF *+AR3,*+AR1,R5 
* 

MPYF R1,*+AR5(IR1),R7 
| | STF R6,*+AR1 

MPYF R3,*AR5,R6 

ADDF  R7,R6 

ADDF R5,R2,R1 

SUBF R5,R2 

SUBF *AR3,*AR1,R5 

SUBF R5,R4,R3 

ADDF R5,R4 

MPYF R3,*+AR4(IR1),R6 


IA1 = IAI1+IE 


(X(I),Y(I)) pointer 

(X(I1),Y(I1)) pointer 
(X(I2),Y(1I2)) pointer 
(X(I3),Y(1I3)) pointer 


RC should be one less than desired # 
If LPCNT = JT, go to 
special butterfly 


Create cosine index AR4 
Adjust sine table pointer 


IA2 IA1+IA1+1 


IA3 IA2+IA1+1 


R3° = Y¥(T)4+¥(E2) 


R5 = Y(1I1)+yY¥(1I3) 
R6 = R3+R5 


R4. = Y(T)4EY( 12) 

R3 = R3iR5 

Rl = X(I)+X(I2) 

R5 = X(1I1)+X(I3) 


= R3*CO2 

Y(I) = R3+R5 

R7 = R1+R5 

R2 = X(1I)#X(I2) 

Rl = R1=R5 

R7 = R1*SI2 

X(I) = R1+R5 

R6 = R3*CO2+R1*SI2 


R5 = Y(I1)+yY (13) 


R7 = R1*C02 

Y(I1) = R3*CO2+R1*S12 
R6 = R3*SI2 

R6 = R1*CO2+R3*S12 

Rl = R2+R5 

R2 = R2+R5 

R5 = X(1I1)+x(13) 

R3 = R4+R5 

R4 = R4+R5 

R6 = R3*COl 
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Example 6-15. Complex Radix-4 DIF FFT (Continued) 


MPYF 
SUBF 
MPYF 


MPYF 
ADDF 
MPYF 


MPYF 
SUBF 
MPYF 


STF R6, *AR1++ (IRO) 


R1,*AR4,R7 
R7,R6 


R3,*AR4,R7 
R7,R6 


R2,*AR6,R7 
R7,R6 


MPYF  R4,*AR6,R7 

ADDF R7,R6 
BLK2 STF R6, *AR3++(IRO) 
* 

CMPI @LPCNT,RO 

BP INLOP 

BR CONT 
* SPECIAL BUTTERFLY FOR W = 
SPCL LDIIR1,AR4 

LSH+1,AR4 

ADDI @SINTAB,AR4 

RPTB BLK3 

ADDF *AR2,*ARO,R1 

SUBF *AR2,*ARO,R2 

ADDF *+AR2,*+AR0,R3 
* 

SUBF *+AR2,*+ARO,R4 
* 

ADDF *AR3,*AR1,R5 

SUBF R1,R5,R6 

ADDF R5,R1 

ADDF *+AR3,*+AR1,R5 
* 

SUBF R5,R3,R7 

ADDF R5,R3 

STF R3, *+ARO 
| | STF R1, *ARO++(IRO) 

SUBF *AR3,*AR1,R1 

SUBF *+AR3,*+AR1,R3 
* 

STF R6, *+AR1 


R1, *+AR4(IR1),R6 ; 
STF R6, *+AR2 ; 


R4,*+AR6(IR1),R6 ; 
STF R6, *AR2++ (IRO) 


R2,*+AR6(IR1),R6 ; 
STF R6, *+AR3 ; 


X(I1) = R1*CO2+R3*SI2 
R7 = R1*SI1 

R6 = R3*CO1+R1*SI1 

R6 = R1*COl 

Y(I2) = R3*CO1+R1*SI1 
R7 = R3*SI1 

R6 = R1*C O1+R3*ST1 
R6 = R4*CO3 

X(I2) = R1*CO1+R3*SI1 
R7 = R2*SI3 

R6 = R4*CO3+R2*SI3 

R6 = R2*CO3 

Y(I3) = R4*CO3tR2*SI3 
R7 = R4*SI3 

R6 = R2*CO3+R4*S13 


x(13) = R2*CO3+R4*SI3 


Loop back to the inner loop 


Point to SIN(45) 


Create cosine index AR4 = CO21 
Rl = X(1I)+X(1I2) 
R2 = X(I)+X(12) 


R3 = Y(1I)+Y (12) 


R4 = Y(1I)+Y¥ (12) 
R5 = X(I1)+X(I3) 
R6 = R5tR1 
Rl = R1+R5 


R5 = Y¥(1I1)+Y (13) 


R7 = R3tR5 

R3 = R3+R5 

Y(I) = R3+R5 
X(I) = R1+R5 

Rl = X(I1)+X(1I3) 


R3 = Y(I1)+yY(1I3) 
Y(I1) = R5tR1 
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Example 6—15. Complex Radix-4 DIF FFT (Continued) 


v 


X(I1) = R3R5 

R5 = R2+R3 

R2 = +R2+R3 

R3 = R4tR1 

R4 = R4+R1 

Rl = R3iR5 

Rl = R1*CO21 

R3 = R3+R5 

R3 = R3*CO21 

Y(I2) = (R3ER5) *CO21 
Rl = R2tR4 

Rl = R1*CO21 

X(I2) = (R3+R5) *CO21 
R2 = R2+R4 

R2 = R2*CO21 

Y¥(I3) = £(R4#R2) *CO21 
X(I3) = (R4+R2) *CO21 


Loop back to the inner loop 


Increment repeat counter for 
next time 


IE 4*IE 


N1 = N2 


JT = N2/2+2 


N2 = N2/4 
Next FFT stage 


| | STF R7, *AR1++(IRO) 
ADDF  R3,R2,R5 
SUBF R2,R3,R2 
SUBF R1,R4,R3 
ADDF  R1,R4 
SUBF R5,R3,R1 
MPYF *AR4,R1 
ADDF R5,R3 
MPYF *AR4,R3 
STF R1, *+AR2 
SUBF R4,R2,R1 
MPYF *AR4,RI1 
STF R3, *AR2++(IRO) 
ADDF R4,R2 
MPYF *AR4,R2 

BLK3 STF R1, *+AR3 
STF R2, *AR3++ (IRO) 
CMPI @LPCNT,RO 
BPD INLOP 

CONT LDI @RPTCNT,AR7 
LDI @IEINDX, AR6 
LSH 2,AR7 

* 
STI AR7, @RPTCNT 
LSH 2,AR6 
STI AR6, @IEINDX 
LDI RO, IRO 
LSH -3,R0 
ADDI 2,R0 
STI RO, @JT 
SUBI 2,R0 
LSH 1,R0 
BR LOOP 

* STORE RESULT USING BITIR 


EV. 


@FFTSIZ,RC 
1,RC 
@FFTSIZ, IRO 
2,IR1 
@INPUT, ARO 
STORE 
@STORE, AR1 
BITRV 
*+ARO(1),RO 
*ARO++(IRO)B,R1 
RO, *+AR1 (1) 

R1, *AR1++ (IR1) 


ELF 


, 


ERSED ADDRESSING 


RC = N 
RC should be one less than desired # 
IRO = size of FFT =N 


Branch to itself at the end 
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6.6.4 Real Radix-2 FFT 


In many cases, the data to be transformed is usually a sequence of real num- 
bers. This real input data has properties that reduce the computational load of 
the FFT algorithm even further. The FFT algorithm that exploits such properties 
is called a real radix-2 FFT. Example 6-16 shows the generic implementation 
of areal-valued, forward radix-2 FFT. For such an FFT, the total storage required 
for a length-N transform is only N locations; in a complex FFT, 2N locations are 
necessary. Recovery of the rest of the points is based on the symmetry condi- 
tions. 


Example 6-16. Real Forward Radix-2 FFT 


KKKKKK KKK KK KKK KKK KK KK KK KKK KK KKK KKK KKK KK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK 


* FILENAME : f£fft_rl.asm 

* 

* WRITTEN BY : Alex Tessarolo 

ms Texas Instruments, Australia 

*x 

* DATE : 23rd July 1991 

* 

* VERSION 2 2.0 

*x 
KKKKKKKKKKKKKKKKKKKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KK KKK KKK KKK KKK KKKKKKKKKKKKKEK 
* VER DATE COMMENTS 

* See SSeS eae Se ee a ee ee 
* 1.0 18th July 91 Original release. 

e220) 23rd July 91 Most stages modified. 

bs Minimum FFT size increased from 32 to 64. 

* Faster in place bit reversing algorithm. 

* Program size increased by about 100 words. 

x One extra data word required. 
KKEKKKKKKKKKKKKKKKKKKKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKKKKKE 
= SYNOPSIS: int ffft_rl( FFT_SIZE, LOG_SIZE, SOURCE_ADDR, DEST_ADDR, 

* SINE_TABLE, BIT_REVERSE ); 

*x 

. int FFT_SIZE ; 64, 128, 256, 512, 1024, 

5 int LOG_SIZE ; 6, De 8, Oy 10, 

* float *SOURCE_ADDR ; Points to location of source data. 

* float *DEST_ADDR ; Points to where data will be 

* ; Operated on and stored. 

* float *SINE_TABLE ; Points to the SIN/COS table. 

* int BIT_REVERSE ; = O, bit reversing is disabled. 

x 


; <> 0, input bit is provided, reversed 
; is enabled. 


NOTE: 1) If SOURCE_ADDR = DEST_ADDR, then in-place bit 
reversing is performed, if enabled (mor 
processor intensive). 

2) FFTI_SIZE must be >= 64 (this is not checked). 


+ + F F HF FH 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


iw) 


ESCRIPTION: Generic function to do a radix-2 FFT computation on the C30. 
The data array is FFT_SIZE-long with only real data. The out- 
put is stored in the same locations with real and imaginary 
points R and I as follows: 


iw) 


EST_ADDR[0] > 


R(FFT_SIZE/2) 
I(FFT_SIZE/2 - 1) 


I(2) 
DEST_ADDR[FFT_SIZE - 1] > 1(1) 


The program is based on the FORTRAN program in the 
paper by Sorensen et al., June 1987 issue of Trans. 
on ASSP. 


Bit reversal is optionally implemented at the begin- 
ning of the function. 

If bit reversal is selected (bit reverse # 0), the data 
input is expected in bit-reverse order 

The sine/cosine table for the twiddle factors is ex- 
pected to be supplied in the following format: 


SINE_TABLE[0]s ® sin(0*2*pi/FFT_SIZI 
sin(1*2*pi/FFT_SIZI 


as, 
es 


sin ((FFT_SIZE/2-2) *2*pi/FFT_SIZE) 
SINE_TABLE[FFT_SIZE/2 - 1] ® sin((FFT_SIZE/2-1) *2*pi/FFT_SIZE) 


NOTE: The table is the first half period of a sine wave. 


Stack structure upon call: 


-FP (7) | BIT_REVERSE 
-FP(6) | SINE_TABLE 
-FP (5) | DEST_ADDR 
-FP (4) | SOURCE_ADDR 
-FP (3) | LOG_SIZE 
-FP(2) | FFT_SIZE 
-FP (1) returne 
“FP (0) | adar 

old FP 


+ + F + + F F F F FF F F F F FF FF FF FF FF F FF FF FF F F FF F FF FF FF F FF FF F FF F OF 


KKK KKK KKK KKK KKK KKK KKK KK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


* 
x NOTE: Calling C program can be compiled using either large 
bs or small model. 
* 
- WARNING: DP initialized only once in the program. Be wary 
with interrupt service routines. Make sure interrupt 
x service routines save the DP pointer. 
* 
* WARNING: The DEST_ADDR must be aligned such that the first 
* LOG_SIZE bits are zero (this is not checked by the 
* program). 
* 
KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KK KKK KKK KK KKK KK KKK KKK KK KKK KKK KKKKKKAKKKK KKK KKK KKK 
* 
fs REGISTERS USED: RO, Rl, R2, R3, R4, R5, R6, R7 
* ARO, AR1, AR2, AR3, AR4, AR5, AR6, AR7 
* IRO, IR1 
* RC, RS, RE 
* DP 
* 
* MEMORY REQUIREMENTS: Program = 405 Words (approximately) 
* Data = 7 Words 
* Stack = 12 Words 
* 
KKK KK KKK KK KKK KKK KK KKK KKK KK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKAKK KK KKK KKK KKK 
* 
a BENCHMARKS : Assumptions - Program in RAMO 
* - Reserved data in RAMO 
. - Stack on primary/expansion bus RAM 
= - Sine/cosine tables in RAMO 
ms - Processing and data destination in RAMI. 
. - Primary/expansion bus RAM, 0 wait state. 
* 
* FFT Size Bit Reversing Data Source Cycles (C30) 
* = is a a as aa a, = fa a a a 
* 1024 OFF RAM1 19816 approx. 
« Note: This number does not include the C callable overheads. 
5 Add 57 cycles for these overheads. 
* 
KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KK KKK KKK KK KKK KK KKK KKK KEK KKK KKK KK KKK KKKKK KKK KKK KKK 
FP .set AR3 
-global —ffftt_r1 , Entry execution point. 
FFT_SIZE -usect ",fftdata”,1 ; Reserve memory for arguments. 
LOG_SIZE -usect " fttdata’;.d 
SOURCE_ADDR -usect " tftdata”’; 1 
DEST_ADDR -usect "Ett data™ 71. 
SINE_TABLE: -usect " fttidata” ;. 
BIT_REVERSE: -usect . fttdata”,1 
SEPARATION: -usect w  £LEdata™ ;. 4. 
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afftit_ri: 


LDI 
CMP I 
BZ 


a 
, 


a 


" ffttext” 


FP 


SP,FP 


R4 
RS 
R6 
R6 
R7 
R7 
AR4 
AR5 
AR6 
AR7 
DP 


a 


FFI_SIZE ; 


*-FP(2),RO ; 


RO, @FFT_SIZ!I 
*-FP (3),RO 
RO, @LOG_SIZ!I 
*-FP (4) ,RO 
RO, @SOURCE_ADDR 
*-FP (5),RO 


13 


3) 


RO, @DEST_ADDR 
*-FP (6),RO 


RO, @SINE_TABLE 


*-FP (7) ,RO 
RO, @BIT_REVERS 


5 


@BIT_R 


EV. 


ERSE, RO 


0,RO 


MOVE_DATA 


Initialize C function. 


Preserve C environment. 


Init. DP pointer. 


Move arguments from stack. 


Check bit reversing mode (on or off). 


BIT_REVERSING = 


0, then OFF 


(no bit reversing). 
BIT_REVERSING <> 0, Then ON. 


Check bit reversing type. 


If SourceAddr = 
bit reversing. 


DestAddr, then in place 


If SourceAddr <> DestAddr, then 
standard bit reversing. 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


LDI @SOURCE_ADDR, RO 
CMPI @DEST_ADDR, RO 
BEQ IN_PLACE 
, 
; Bit reversing Type 1 (from source to 
; destination). 
, 
;NOTE: abs (SOURCE_ADDR —- DEST_ADDR) 
; must be > FFT_SIZE, this is not 
; checked. 
, 
LDI @FFT_SIZE, RO 
SUBI 2,R0 
LDI @FFT_SIZE, IRO 
LSH -1,IRO , IRO = half FFT size. 
LDI @SOURCE_ADDR, ARO 
LDI @DEST_ADDR, AR1L 
LDF *ARO++,R1 
RPTS RO 
LDF *ARO++,R1 
| | STF R1, *AR1++(IRO)B 
STF R1, *AR1++(IRO)B 
BR START 
, 
; In-place bit reversing. 
, 
; Bit reversing on even locations, 
7 ist half only. 
IN_PLACE: iDT @FFT_SIZE, IRO 
LSH =-2,IRO ; IRO = quarter FFT size. 
LDI 2,IR1 
LDI @FE SIZE, RC 
LSH =27 RC 
SUBI 3,RC 
LDI @DEST_ADDR, ARO 
LDI ARO, AR1 
LDI ARO, AR2 
NOP *AR1++(IRO)B 
NOP *AR2++(IRO)B 
LDF *++ARO (IR1),RO 
LDF *AR1,R1 
CMPI AR1, ARO ; Xchange locs only if ARO<AR1. 
LDFGT RO,R1 
LDF GT *AR1++(IRO)B,R1 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


RPTB BITRV1 
LDF *++ARO (IR1),RO 
| | STF RO, *ARO 
LDF *AR1,R1 
| | STF R1,*AR2++(IRO)B 
CMPI AR1,ARO 
LDFGT RO,R1 
BITRV1: LDF GT *AR1++ (IRO)B, RO 
STF RO, *ARO 
STF R1, *AR2 
; Perform bit reversing on odd 
; locations, 2nd half only. 
LDI @FFT_SIZE,RC 
LSH -1,RC 
LDI @DEST_ADDR, ARO 
ADDI RC, ARO 
ADDI 1,AR0 
LDI ARO, AR1 
LDI ARO, AR2 
LSH =1,RC 
SUBI SRE 
NOP *ARI++(IRO)B 
NOP *AR2++(IRO)B 
LDF *++ARO (IR1),RO 
LDF *AR1,R1 
CMP TI AR1,ARO : Xchange locs only if ARO<AR1. 
LDFGT RO,R1 
LDF GT *AR1++ (IRO)B,R1 
RPTB BITRV2 
LDF *++ARO (IR1),RO 
| | STF RO, *ARO 
LDF *AR1,R1 
| | STF R1,*AR2++(IRO)B 
CMP I AR1,ARO 
LDF GT RO,R1 
BITRV2: LDF GT *AR1++ (IRO)B, RO 
STF RO, *ARO 
STF R1, *AR2 
H Perform bit reversing on odd 
H locations, lst half only. 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


LDI @FFT_SIZE,RC 
LSH -1,RC 

LDI RC, IRO 

LDI @DEST_ADDR, ARO 
LDI ARO, ARI 


ADDI 1,ARO 
ADDI IRO,AR1 
LSH -1,RC 
LDI RC, IRO 
SUBI 2,RC 


LDF *BRO, RO 
LDF *AR1,RI 
RPTB BITRV3 
LDF *++ARO(IR1),RO 
| | STF RO, *AR1++ (IRO)B 
BITRV3: LDF *AR1,R1 


| | STF R1, *-ARO (IR1) 


STF RO, *AR1 
STF R1, *ARO 


BR START 


: Check data source locations. 


; If SourceAddr = DestAddr, then 
; do nothing. 
; If SourceAddr <> DestAddr, then move 


data. 
, 

MOVE_DATA: LDI @SOURCE_ADDR, RO 
CMP I @DEST_ADDR, RO 
BEQ START 
LDI @FFT_SIZE, RO 
SUBI 2,R0 
LDI @SOURCE_ADDR, ARO 
LDI @DEST_ADDR, AR1 
LDF *ARO++,R1 
RPTS RO 
LDF *ARO++,R1 


| | STF R1, *AR1++ 


STF R1, *AR1 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


; Perform first and second FFT loops. 


’ — 


START: 


AR1 
AR2 
AR3 
AR4 


AR1 


LOOP1_2: 


vvy 


EL 


I2 


13 


14 


yj 


ry 


NDPNPNNNNNPE 
CG 
Ww 
Ry] 
WwW 


nnn 
744 
ty 


STF 


®e WD NY FF OO 


¢ [X(I1) X(I2)] + [X(I3) + X(14)] 
<¢ [X(I1) X (12) ] 
¢ [X(I1) X(I2)] - [X(I3) + X(I4)] 
¢# -[X(1I3) X (14) ] 
@DEST_ADDR, AR1 
AR1,AR2 
AR1,AR3 
AR1,AR4 
1,AR2 
2,AR3 
3,AR4 
4,IR0 
@FFTI_SIZE,RC 
-2,RC 
2,RC 
*AR2,RO ; RO = X(I2) 
*AR3,R1 ; Rl = X(I3) 
R1, *AR4,R4 ; R4 = X(I3) + X(14) 
R1, *AR4++(IRO),R5 ; RS = -[X(1I3) - X(14)]— 
RO, *AR1,R6 ; R6 = X(I1) - X(1I2) — J 
RO, *AR1++(IRO),R7 ; R7 = X(I1) + X(I2) 
R7,R4,R2 ; R2 = R7 + R4 
R4,R7,R3 ; R3 = R7 - R4— 
LOOP 1_2 i 


R1,*AR4,R4 


RO, *AR1,R6 


R7,R4,R2 
R4,R7,R3 


R3, *AR3 


R6, *AR2 


*+AR2 (IRO),RO 
*+AR3 (IRO),R1 


R3, *AR3++(IRO) 
R1,*AR4++(IRO),R5  ; 
R5,*-AR4 (IRO) 
R6, *AR2++(IRO) 


RO, *AR1++(IRO),R7 ; 
R2, *-AR1 (IRO) 


R5,*-AR4 (IRO) 


R2, *-AR1 (IRO) 


p X(13)-—— 


; X(14)< 


; X(12) <4 


; X(I1)< 


DSP Algorithms 


6-49 


Fast Fourier Transforms (FFTs) 


Example 6—16. Real Forward Radix-2 FFT (Continued) 


Perform third FFT loop. 


Part A: 


AR1 


AR2 


AR3 


LOOP3_A: 


Il 


I2 


13 


14 


Oo <€ X(I1) 
1 
Zz 
3 
4 4 X(I1) 
5 
6 6<€ -X (14) 
7 
8 
9 


@DEST_ADDR, AR1 
AR1,AR2 
AR1,AR3 

4, BAR2 

6,AR3 

8, IRO 
@FFT_SIZE,RC 
=3, RC 

2,RC 


*AR2,*AR1,R1 
*AR2,*AR1,R2 
*AR3,R3 


LOOP3_A 

*+AR2 (IRO),RO 
R2, *AR1++(IRO) 
RO, *AR1,R1 

R1, *AR2++(IRO) 
RO, *AR1,R2 

R3, *AR3++ (IRO) 
*AR3,R3 


R2,*ARL 
R1, *AR2 
R3, *AR3 


+ 


X (13) 


X (13) 


; RO = X(I3) 

; RI = X(I1) - X(13) 4 
R2 = X(I1) + X(I3)7 
R3 = -X(I4)7 

X(I1) 4 

; X(13) 4 
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Real Forward Radix-2 FFT (Continued) 


; Part B: 
: — 

r 

; ARO 
r 

; ARI 
, 

r AR2 
r 

7 AR3 
rg 

i ~~ ARO 


vvevweeev=eoese$s 


ADDI 
ADDI 
ADDI 
ADDI 


0 

1 @ X[I1] + [X(1I3)*COS+ X(I4) *COS] 
2 

3 @ xX[I1] - [X(13)*COS+ X(14) *COS] 
4 

5 @ -X[I2] - [X(1I3)*COS-— X(14) *COS] 
6 

7 @ xX[I2] - [X(1I3)*COS-— X(1I4) *COSs] 
8 

9 NOTE: COS (2*pi/8) = SIN(2*pi/8) 
@FFT_SIZE,RC 

=3;,.RG 

RC, IR1 

3,RC 

8, IRO 

@DEST_ADDR, ARO 

ARO, AR1 

ARO, AR2 

ARO, AR3 

1,ARO 

3,AR1 

5, AR2 

7,AR3 

@SINE_TABLE, AR7 Initialize table pointers. 


*++AR7 (IR1),R7 


*AR7, *AR2, RO 


*AR3,R7,R1 
RO,R1,R2 
*AR7, *+AR2 (IRO),RO 
RO,R1,R3 
*AR1,R3,R4 
*AR1,R3,R4 
R4, *AR2++(IRO) 
R2, *ARO,R4 
R4, *AR3++ (IRO) 
*ARO,R2,R4 
R4, *AR1++(IRO) 


LOOP 3_B 

*AR3,R7,R1 

R4, *ARO++ (IRO) 
RO,R1,R2 
*AR7, *+AR2 (IRO) , RO 


R7 = COS (2*pi/8) 
*ART = COS (2*pi/8) 


RO = X(I3)*COS 
R5 = X(14)*COS 


x (12) ¢————_ 


X(I1) ¢ 


R2 = [X(I3)*COS + X(I4) *COS] 
R3 = -—[X(I3)*COS - X(I4) *COS] 
R4 = -X(I2) + R3 — 

R4 = X(I2) + R3 

Xx (13) <— 

R4 = X(I1) - R2 — 

X(14) <4 

R4 = X(I1) + R2 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


SUBF3 RO,R1,R3 
SUBF3 *AR1,R3,R4 
ADDF3 *AR1,R3,R4 
STF R4, *AR2++(IRO) 
SUBF3 R2,*ARO,R4 
STF R4, *AR3++(IRO) 
LOOP3_B: ADDF3 *ARO,R2,R4 

STF R4, *AR1++(IRO) 
MPYF3 *AR3,R7,R1 
STF R4, *ARO++(IRO) 
ADDF3 RO,R1,R2 
SUBF3 RO,R1,R3 
SUBF3 *AR1,R3,R4 
ADDF3 *AR1,R3,R4 

|| STF R4, *AR2 
SUBF3 R2,*ARO,R4 

|| STF R4,*AR3 
ADDF3 *ARO,R2,R4 

|| STF R4,*AR1 
STF R4, *ARO 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


, 
: Perform fourth FFT loop. 
i 
; Part A: 
4 ARL > Tl 0 ¢ X(I1) + X(13) 
‘ 1 
: 2 
: 3 
; I2 4 
; 5 
; 6 
: o 
i AR2 »> 13 8 ¢@ xX(I1) - X(I3) 
; 9 
; 10 
; ule 
: AR3 »> r4 12 € -X(I4) 
é 13 
’ 14 
f _ 15 
’ ARL » 15 16 
: 17 
Ul T 
r 1 
i v 
LDI @DEST_ADDR, AR1 
LDI AR1,AR2 
LDIA R1,AR3 
ADDI 8,AR2 
ADDI 12,AR3 
LDI 16,IR0 
LDI @FFT_SIZE,RC 
LSH -4,RC 
SUBI 2,RC 
SUBF3 *AR2,*AR1,R1 
ADDF3 *AR2,*AR1,R2 
NEGF *AR3,R3 
RPTB LOOP4_A 
LDF *+AR2 (IRO),RO ; RO = X(TI3) 
STF R2, *AR1++(IRO) 
SUBF3 RO, *AR1,R1 ; RL = X(I1) - X(I3) 7 
STF R1, *AR2++(IRO) : 
ADDF3 RO, *AR1,R2 ; R2 = X(I1) + X(1I3) 7 
STF R3, *AR3++ (IRO) ; 
LOOP 4_A: NEGF *AR3,R3 ; R3 = -xX(I14) — 
, 
STF R2,*AR1 ; X(I1) ¢ 
STF R1, *AR2 ; X(1I3) 4 
STF R3, *AR3 ; X(14) <———_ 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


7 Pact. Bt 
. 0 
’ ARO > Il (3rd) | 1 ¢ X[I1] + [X(I3)*COS+ X(I4)*SIN] 
‘ I1 (2nd) | 2 
: Il (1st) | 3 
; 4 
; I2 (1st) |5 
i I2 (2nd) | 6 : 
i ARL > I2 (3rd) |7 ¢ X[I1] - [X(I3)*cOS+ X(I4)*SIN] 
; 8 
i AR2 >» I3 (3rd) |9 ¢ -X[I2] - [X(I3)*cOS- X(I4)*COS] 
: I3 (2nd) | 10 
: AR4 » I3 (1st) | 11 
: 12 
7 I4 (1st) | 13 
; I4 (2nd) | 14 . 
: AR3 » I4 (3rd) [15 @ X[I2] - [X(I3)*SIN- X(I4)*COS] 
po 16 
; ARO » 17 
v 
LDI @FFT_SIZE,RC 
LSH -4,RC 
LDI RC, IR1 
LDI 2, 1R0 
SUBI 3,RC 
LDI @DEST_ADDR, ARO 
LDI ARO, AR1 
LDI ARO, AR2 
LDI ARO, AR3 
LDI ARO, AR4 
ADDI 1,AR0 
ADDI 7,AR1 
ADDI 9, AR2 
ADDI 15,AR3 
ADDI 11,AR4 
LDI @SINE_TABLE, AR7 
LDF *++AR7 (IR1),R7 ; RT = SIN(1*[2*pi/16]) 
; *ART = COS(3*[2*pi/16]) 
LDI AR7, AR6 
LDF *++4AR6 (IR1),R6 ; R6 = SIN(2*[2*pi/16]) 
; *AR6 = COS(2*[2*pi/16]) 
LDI AR6, AR5 
LDF *++4AR5 (IR1),R5 ; RS = SIN(3*[2*pi/16]) 
; *AR5 = COS(1*[2*pi/16]) 
LDI 16,IR1 
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Fast Fourier Transforms (FFTs) 


MPYF'3 


MPYF3 
MPYF'3 
MPYF3 
DDF3 
PYF3 
UBF 3 
UBF 3 
DDF3 
TF 
UBF 3 
TF 
DD 
TF 


A 
M 
S 
S 
A 
S 
S 
S 
A 
S 


*AR7, *AR4,RO 


*++AR2 (IRO),R5,R4 
*—--AR3(IRO),R5,R1 
*AR7, *AR3, RO 
RO,R1,R2 

*AR6, *-AR4, RO 
R4,R0,R3 
*--AR1(IRO),R3,R4 
*AR1,R3,R4 

R4, *AR2—- 
R2,*++ARO (IRO),R4 
R4, *AR3 

F3 *ARO,R2,R4 

R4, *ARL 


*++AR3,R6,R1 
R4, *ARO 
RO,R1,R2 


RO,R1,R3 
*+4+AR1,R3,R4 
*AR1,R3,R4 
R4, *AR2 
R2,*--ARO,R4 


R4, *ARL 


*—--AR2,R7,R4 
R4, *ARO 
*+4+AR3,R7,R1 
*AR5, *AR3, RO 
RO,R1,R2 


R4,R0,R3 
*++AR1,R3,R4 
*AR1,R3,R4 

R4, *AR2++(IR1) 
R2,*--ARO,R4 
R4, *AR3++(IR1) 
*ARO,R2,R4 
R4, *AR1++(IR1) 


LOOP 4_B 
*++AR2(IRO),R5,R4 
R4,*ARO++ (IR1) 
*--AR3(IRO),R5,R1 
*AR7, *AR3,RO 
RO,R1,R2 

*AR6, *-AR4, RO 
R4,R0,R3 
*--AR1(IRO),R3,R4 
*AR1,R3,R4 


*AR5, *-AR4 (IRQ), RO 


*AR7, *++AR4 (IR1),RO 


RO = X (13) *COS (3) 

R4 = X (13) *SIN (3) 

Rl = X (14) *SIN (3) 

RO = X (14) *COS (3) 

R2 [X(I3)*COS + X(I4) *SIN] 
R3 -—[X(I3)*SIN - X(I4) *COS] 
R4 = -X(I2) + R37 

R4 = X(I2) + R3 

X(I13) <— 

R4 = X(I1) - R2 - 

X(14) 4 

R4 = X(I1) + R2 


X(I2) <——_ 


X(I1) <¢ 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


LOOP4_B: 


STF 
SUBF3 
STF 
ADDF3 
STF 


MPYF3 
STF 
DDF3 
PYF3 
UBF 3 
UBF 3 
DDF3 
TE 
UBF 3 
iF 
DDF3 
STF 


PHNNNnNPrTHNN SE py 


MPYF3 
STF 

MPYF3 
MPYF3 
DDF3 
PYF3 
UBF 3 
UBF 3 
DDF3 
TE 

UBF 3 
TE 

DDF3 
STF 


PHANNPrTHNN E Pp 


MPYF3 
STF 

MPYF3 
MPYF3 
DDF3 
PYF3 
UBF 3 
UBF 3 
DDF3 
TE 

UBF 3 
LE 

DDF3 
STF 


PNNNnNPrPHNHNE PY 


MPYF3 
STF 

ADDF3 
MPYF3 


R4, *AR2-- 
R2,*++ARO(IRO),R4 
R4, *AR3 
*BRO,R2,R4 

R4, *ARL 


*+4+AR3,R6,R1 
R4, *ARO 
RO,R1,R2 
*AR5, *-AR4 (IRO),RO 
RO,R1,R3 
*++AR1,R3,R4 
*AR1,R3,R4 
R4, *AR2 
R2,*--ARO,RA4 
R4, *AR3 
*BRO,R2,R4 
R4, *ARL 


*—--AR2,R7,R4 
R4, *ARO 
*+4+AR3,R7,R1 
*AR5, *AR3,RO 
RO,R1,R2 

*ART, *++AR4 (IR1),RO 
R4,R0,R3 
*++AR1,R3,R4 
*AR1,R3,R4 

R4, *AR2++(IR1) 
R2,*--ARO,RA 
R4, *AR3++(IR1) 
*BRO,R2,R4 

R4, *AR1++(IR1) 


*++AR2 (IRO),R5,R4 
R4, *ARO++ (IR1) 
*—-AR3(IRO),R5,R1 
*ART, *AR3, RO 
RO,R1,R2 

*AR6, *-AR4, RO 
R4,R0,R3 
*—--AR1(IRO),R3,R4 
*AR1,R3,R4 

R4, *AR2-- 
R2,*++ARO(IRO),R4 
R4, *AR3 
*BRO,R2,R4 

R4, *ARL 


*++AR3,R6,R1 

R4, *ARO 

RO,R1,R2 
*AR5, *-AR4 (IRO),RO 
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Fast Fourier Transforms (FFTs) 


RO,R1,R3 
*++AR1,R3,R4 
*AR1,R3,R4 
R4, *AR2 
R2,*--ARO,R4 
R4, *AR3 
*ARO,R2,R4 
R4, *ARL 


*—--AR2,R7,R4 
R4, *ARO 
*++AR3,R7,R1 
*AR5, *AR3,RO 
RO,R1,R2 
R4,R0,R3 
*++AR1,R3,R4 
*AR1,R3,R4 
R4, *AR2 
R2,*--ARO, RA 
R4, *AR3 
*ARO,R2,R4 
R4,*ARL 


R4, *ARO 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


Perform remaining FFT loops 


: [ X’ (11) 

: AR1L® | X(I1) (1st) 
; X(I1) (2nd) 
: X(I1) (3rd) 
7 A 

; x! (a2) 

, B 

' X(I2) (3rd) 
; X(I2) (2nd) 
. AR2® | x(I2) (1st) 
c xX’ (13) 

: aAR3” | x(13) (1st) 
; X(I3) (2nd) 
; X(13) (3rd) 
F Cc 

7 X’ (14) 

r D 

: X(I4) (3rd) 
J X(I4)_ (2nd) 
AR4® | x(14) (1st) 
’ ~ aARI> 


(loop 4 onwards). 


LOOP 
lst 2nd 

vey 

0 O @ xX’ (I1)+ X’ (13) 

al 1 @ X(I1) + [X(I3)*COS + 
2 2 

3 3 

8 16 

13. 29 

14 30 ; 

15 31 @ xX[I1] - [xX (13)*cos + 
16 32 @ x’ (I1)- xX’ (13) 

17 33 @ -X[I2]- [X(I3)*SIN - 
18 34 

19 35 

24 48 g@ -xX’ (14) 

29 61 ¢€ 

30 62 ; 

31 63 X[I2] = [X(I3)*SIN = 
32 64 

33 «665 


@FFT_SIZE, IRO 
-2,IRO 

RO, @SEPARATION 
—2,IRO 

5,R5 

3,R7 

16,R6 
@DEST_ADDR, AR5 
@DEST_ADDR, AR1 
-1,IR0 

1,R7 


X (14) *SIN] 


X (14) *SIN] 


X (14) *COS] 


X (14) *COS] 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


LOOP : ADDI 1,R7 
LSH 1,R6 
LDI AR1,AR4 
ADDI R7,AR1 ; AR1 points at A. 
LDIA R1,AR2 
ADDI 2,AR2 ; AR2 points at B. 
ADDI R6,AR4 
SUBI R7,AR4 ; AR4 points at D. 
LDI AR4,AR3 
SUBI 2,AR3 ; AR3 points at C. 
LDI @SINE_TABLE, ARO ; ARO points at SIN/COS table. 
LDI R7, IRL 
LDI R7,RC 
INLOP: ADDF3 *--AR1(IR1),*++AR2(IR1),RO ; RO = X’(I1) + X’ (13) 7 
SUBF3 *—-AR3(IR1),*AR1++,R1 ; RL = xX’ (Il) - X’ (13)7 
NEGF *--AR4,R2 ; R2 = -X’ (14) ——] 
|| STF RO, *-AR1 ; xX’ (I1) 
STF R1, *AR2-- ; xX’ (13) 
|| STF R2, *AR4++(IR1) ; X' (14) <«<——_ 
LDI @SEPARATION, IR1 ; IR1L=SEPARATION 
BETWEEN SIN/COS TBLS 
SUBI 3,RC 
MPYF3 *++AR0 (IRO), *AR4,R4 ; R4 = X(1I4)*SIN 
MPYF3 *ARO, *++AR3,R1 ; Rl = X(I3)*SIN 
MPYF3 *++ARO(IR1),*AR4,RO ; RO = X(I4)*COS 
MPYF3 *ARO, *AR3, RO ; RO = X(I3)*COS 
SUBF3 R1,R0,R3 ; R3 = -[X(I3)*SIN - X(I4) *COS] 
MPYF3 *++ARO (IRO), *-AR4,RO0 
ADDF3 RO,R4,R2 ; R2 = X(13)*COS + X(14)*SIN 
SUBF3 *AR2,R3,R4 ; R4 = R3 - X(I2) 
ADDF3 *AR2,R3,R4 ; R4 = R3 + X(I2) 
STF R4, *AR3++ ; X(13) <—— 
SUBF3 R2,*AR1,R4 ; R4 = X(I1) - R2 
STF R4, *AR4-- ; X(14) « 
ADDF3 *AR1,R2,R4 ; R4 = X(I1) + R2 
STF R4, *AR2-- ; X(12) <———_ 
, 
RPTB IN_BLK i 
LDF *-ARO(IR1),R3 ; 
MPYF3 *AR4,R3,R4 ; 
STF R4, *AR1++ ; X(I1) «4 
MPYF3 *AR3,R3,R1 
MPYF3 *ARO, *AR3, RO 
SUBF3 R1,R0,R3 
MPYF3 *++ARO (IRO), *-AR4,RO 
ADDF3 RO,R4,R2 
SUBF3 *AR2,R3,R4 
ADDF3 *AR2,R3,R4 
STF R4, *AR3++ 
SUBF3 R2,*AR1,R4 
STF R4, *AR4-- 
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Example 6—16. Real Forward Radix-2 FFT (Continued) 


IN_BLK: 


ADDF3 
| | STF 


*AR1,R2,R4 

R4, *AR2-— 
*-BARO(IR1),R3 
*AR4,R3,R4 

R4, *AR1++ 
*AR3,R3,R1 
*ARO, *AR3, RO 
R1,R0,R3 
R6,IR1 
RO,R4,R2 
*AR2,R3,R4 
*AR2,R3,R4 

R4, *AR3++ (IR1) 
R2,*AR1,R4 

R4, *AR4++(IR1) 
*AR1,R2,R4 

R4, *AR2++ (IR1) 


R4, *AR1++(IR1) 


AR5,AR1,R0 
@FFT_SIZE, RO 
INLOP H 


a 


@SINE_TABLE, ARO i 


R7,IR1 
R7,RC 


1,R5 
@LOG_SIZE,R5 
LOOP 
@DEST_ADDR, AR1 
-1,IR0 

1,R7 


’ 


DP ; 
; 

AR7 

AR6 

AR5 

AR4 

R7 

R7 

R6 

R6 

R5 

R4 

FP 


LOOP BACK TO THI 
INNER LOOP 

ARO POINTS TO 
SIN/COS TABLE 


je) 


Return to C environment. 


Restore C environment 
variables. 
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Example 6—17 shows the implementation of a radix-2 real inverse FFT. The in- 
verse transformation assumes that the input data is in the same order as the 
output of the forward transformation. It also produces a time signal in the proper 
order. In other words, bit reversing takes place at the end of the program. 


Example 6-17. Real Inverse Radix-2 FFT 


* Real Inverse FFT 


KKK KKK KKK KKK KKK KKK KKK KK KKK KK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK 


* 


KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK 


* VER DATE 


Daniel Mazzocco 


Texas Instruments, 


* FILENAME ifft_rl.asm 
* 

* WRITTEN BY 

* 

* 

* DATE : 18th Feb 1992 
* 

* VERSION 1.0 

* 


Houston 


COMMENTS 


KKK KKK KKK KKK KK KKK KK KKK KKK KKK KK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK 


SYNOPSIS: 


+ + F FF F F F F F F F FF F F F OF 


1.0 18th Feb 92 


int 


int 
int 
float 
float 


float 
int 


NOTE: 


Original release. Started from forward real FFT 
routine written by Alex Tessarolo, rev 2.0 


DEST_ADDR, SIN 


ifft_rl( FFT_SIZE, LOG_SIZE, SOURCE_ADDR, 


E TABLE, BIT_REVERSE ); 


x 


FFT_SIZ! 
LOG_SIZE ; 
*SOURCE_ADDR ; 


1c.) 


*DEST_ADDR ; 
*SINE_ TABLE ; 
BIT_REVERSE ; 


1) If SOURCE_ADDR = DEST_ADDR, then in place bit 
reversing is performed, if enabled (more 
processor intensive). 

2) FFT_SIZE must be >= 64 (this is not checked). 


64, 128, 256, 512, 1024, 
6, aa 8, 9, 10; 
Points to where data is originated 
and operated on. 
Points to where data will be stored. 
Points to the SIN/COS table. 
= 0, bit reversing is disabled. 
<> 0, bit reversing is enabled. 
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Example 6-17. Real Inverse Radix-2 FFT (Con 


tinued) 


on the C30. 
The data array is FFT_SIZ 
points R and I as follows 


SOURCE_ADDR[0] 


SOURCE_ADDR[FFT_SIZE-1] 


The output data array wil 
Bit reversal is optionall 
of the function. 


to be supplied in the fol 


SINE_TABLE[0] 


NOTE: The table is the fi 


Stack structure upon call 


E long with real and imaginary 


R(FFT_SIZE/2) 
I(FFT_SIZE/2 - 1) 


PI(1) 


1 contain only real values. 
y implemented at the end 


lowing format: 


> sin(0*2*pi/FFT_SIZ! 
sin(1*2*pi/FFT_SIZI 


Es. 
a 


sin ((FFT_SIZE/2-2) *2*pi/FFT_SIZI 


rst half period of a sine wave. 


old FP 


-FP (7) BIT_REVERSE 
—-FP (6) SINE_TABLE 
-FP (5) DEST_ADDR 
—FP (4) SOURCE_ADDR 
-FP (3) LOG_SIZE 
-FP (2) FFT_SIZE 
-FP (1) returne 

—-FP (0) addr 


+ + + FF FF FF FF FF FF F FF FF FF FF FF FF F F F FF FF FF F F FF F F F FF F 


DESCRIPTION: Generic function to do an inverse radix-2 FFT computation 


The sine/cosine table for the twiddle factors is expected 


Al 
—_ 


SINE_TABLE[FFT_SIZE/2-1] s ®in((FFT_SIZE/2-1) *2*pi/FFT_SIZE) 


KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK 
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Example 6—17. Real Inverse Radix-2 FFT (Continued) 


+ + FF F F F F F HF 


NOTE: 


WARNING: 


WARNING: 


Calling C program can be compiled using either large 
or small model. 


DP initialized only once in the program. Be wary 
with interrupt service routines. Make sure interrupt 
service routines save the DP pointer. 


The SOURCE_ADDR must be aligned such that the first 
LOG_SIZE bits are zero (this is not checked by the 
program). 


KKK KKK KKK KKK KKK KK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK 


REGISTERS USED: 


MEMORY REQUIREM 


* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 


* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 


BE 


FFT_SIZE: 
LOG_SIZE: 


SOURCE_ADDR: 


DEST_ADDR: 
SINE_TABLE: 


BIT_REVERSE: 


SEPARATION: 


BENCHMARKS : 


RO, Rl, R2, R3, R4, R5, R6, RT 
ARO, AR1, AR2, AR3, AR4, AR5, AR6, ART 


IRO, IR1 
RC, RS, RE 
DP 
ENTS: Program = 322 words (approximately) 
Data = 7 words 
Stack = 12 words 


KKEKKKKKKKKK KKK KK KKK KKK KK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKKKKKKKKKKK 


Assumptions - Program in RAMO 
- Reserved data in RAMO 
— Stack on primary/expansion bus RAM 
Sine/cosine tables in RAMO 
- Processing and data destination in RAM1 
—- Primary/expansion bus RAM, 0 wait state 


FFT Size Bit Reversing Data Source 

1024 OFF RAM1 25892 approx. 

Note: This number does not include the C callable overheads. 
Add 57 cycles for these overheads. 


Cycles (C30) 


KKK KKK KK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KK KKK KKK KKK 


-set AR3 
-global aifftt_ rl ; Entry execution point. 
-usect ” -ifftdata”,1 ; Reserve memory for arguments. 


-usect ” -ifftdata”,1 
-usect ” .ifftdata”,1 
-usect ” .ifftdata”,1 
-usect ” -ifftdata”,1 
-usect ” -ifftdata”,1 
-usect ” .ifftdata”,1 


DSP Algorithms 6-63 


Fast Fourier Transforms (FFTs) 


Example 6—17. Real Inverse Radix-2 FFT (Continued) 


EEE es 


C 
NANNNNNNNNWNNH 


DP 


TI 


TI 


TI 


PFPNRHNRPHNEP MEP HNP FP Nee oN Oe 
c 


OH 
HOH 


n 
iol 
Hi 


" iffttext” 


FP 
SP, FP 
R4 

R5 

R6 

R6 

R7 

R7 
AR4 
AR5 
AR6 
ART 
DP 


FFI_SIZE 


*-FP (2),RO 
RO, @FFT_SIZI 
*-FP (3),RO 


; Initialize C Function. 


; Preserve C environment. 


; Initialize DP pointer. 


; Move arguments from stack. 


13) 


RO, @LOG_SIZI 
*-FP (4),RO 


*-FP (5),RO 


*-FP (6) ,RO 


1c 


RO, @SOURCE_ADDR 


RO, @DEST_ADDR 


*-FP(7),RO 


RO, @SINE_TABLE 


RO, @BIT_REV 


E.RSE 
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Example 6—17. Real Inverse Radix-2 FFT (Continued) 


Fast Fourier Transforms (FFTs) 


LOOP: 


; Perform last FFT loops first (loop 2 
LOOP 
lst 2nd 
_ vv 
X’ (11) 0 0 <¢ 
AR1® | X(I1) (1st) 1 1 <¢ 
X(I1) (2nd) 2 2 
X(I1) (3rd) 3 3 
A 
X’ (12) 8 16 ¢ 
B 
X(I2) (3rd) L3 29 
X(12) (2nd) | 14 30 
AR2 ® | X(z2) (Ist) | 15 31 € 
xX’ (13) 16 32 ¢€ 
AR3 ® | X(1I3) (1st) 17 33 ¢€ 
X(13) (2nd) ]/ 18 34 
X(I3) (3rd) 19 35 
fe! 
Xx’ (14) 24 48 «¢ 
D 
X(1I4) (3rd) 29 61 
X(14) (2nd) | 30 62 
aR4® | x(14) (Ist) | 31 63 ¢ 
L 32 64 
AR1 > 33. 65 
LDI 1,IRO 
LDI 4,R5 
LDI @FFT_SIZE,R7 
LSH -2,R7 
SUBI RG 
LDI @FFT_SIZE,R6 
LSH 1,R6 
LDI @SOURCE_ADDR, AR5 
LDI @SOURCE_ADDR, AR1 
LSH -1,R6 
LDI AR1,AR4 
ADDI R7,AR1 


onwards). 


Xx’ (11) + X’ (13) 


X(I1) + 


[X (12) 


XC 2)* 22 


X[I4] - 


[X(I3) 


X’ (I1)- X’ (13) 


[X (I1) -X (12) ] *COS-[X(I3)+xX(I4) ]*SIN 


-X’ (14) *2 


[X(I2)-X (12) ]*SIN+[X(1I3)+xX (14) ]*COS 


; Step between two consecutive sines 


; Stage number from 4 to M. 


; R7 is FFT_SIZI 


, R6 is FFT_SIZE at the lst loop. 


E/4-1 


; AR1 points at A. 


(ie 15 for 64 pts) 
, and will be used to point at A & D. 
; R6 will be used to point at D. 
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Example 6—17. Real Inverse Radix-2 FFT (Continued) 


FP HrHr NHN P Pe 
iw) 
A 


INLOP: 


SH 


AR1, AR2 
2,AR2 
R6,AR4 


r 


AR2 points at B. 


R7,AR4 ; AR4 points at D. 


AR4, AR3 


2,AR3 ; AR3 points at C. 


R7, IR1 
R7,RC 


*--AR1(IR1),* 
—-AR3(IR1),RO 
*AR3,*AR1,R1 
*--AR4,R2 

RO, *AR1++ 
-2.0,R2 
*—--AR2,R3 

1, *AR3++ 
2.0,R3 

R3, *AR2++ (IR1) 
R2, *AR4++(IR1) 


@FFT_SIZE,IR1 


@SINE_TABLE, ARO 
-2,IR1 
3, RC 


*AR2,*AR1,R3 
*AR1, *AR2,R2 
R3,*++ARO(IRO),R1 
*AR4,R4 

R3, *++ARO (IR1),RO 
*AR3,R4,R3 

R4, *AR3,R2 
R2,*AR1L++ 
R2,*ARO--(IR1),R4 
R3, *AR2—- 
R4,R1,R3 
R2,*ARO,R1 

R3, *AR4——- 
R1,RO,R4 


IN_BLK 


RO XY (TL) + Xk? (13) — J 
Rl = X’ (11) - X’ (13) 74 


Xx’ (I1) «4 
R2 = -2*xX’ (14) — 


x’ (13) 4 
R3 = 2*X! (12) 7 
x? (12). <——— 
xX’ (14) <——_ 


IR1l=separation between SIN/ 
COS tbls 
ARO points at SIN/COS table. 


R3 = X(I1)-X(I2) 

R2 = X(I1)+X(I2) —7J) 
Rl = R3*SIN 

R4 = X(I4) 

RO = R3*COS 

R3 = X(14)-X(I3) 

R2 = X(13)+X(I4) 

X(I1) + —_—_ 
R4 = R2*COS 

X (12) < 

R3 = R3*SIN + R2*COS 
Rl = R2*SIN 

X (14) < 

R4 = R3*COS - R2*SIN 
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Fast Fourier Transforms (FFTs) 


IN_BLK: 


PYF3 


HA 
ty 


PYF3 


A 
M 
S 
S 
S 
A 
MPYF3 
S 
L 
M 
S 
A 


OHH 


3 
U fet 
ToOHwAVvU 


*AR2,*AR1,R3 
*AR1, *AR2,R2 
R3,*++ARO (IRO),R1 
R4, *AR3++ 
*AR4,R4 

R3, *++ARO (IR1),RO 
*AR3,R4,R3 

R4, *AR3,R2 

R2, *AR1++ 

R2, *ARO--(IR1),R4 
R3, *AR2-- 
R4,R1,R3 

R2, *ARO,R1 

R3, *AR4-- 
R1,RO,R4 


*AR2,*AR1,R3 
*AR1, *AR2,R2 
R3,*++ARO(IRO),R1 
R4, *AR3++ 

*AR4,R4 

R3, *++ARO (IR1),RO 
*AR3,R4,R3 

R4, *AR3,R2 
R2,*ARL 
R2,*ARO--(IR1),R4 
R3, *AR2 

R6,IR1 
R4,R1,R3 
R2,*ARO,R1 

R3, *AR4++(IR1) 
R1,RO,R4 
*AR1++(IR1),R2 
R4, *AR3++(IR1) 


AR5,AR1,RO 
@FFT_SIZE, RO 
INLOP 
*AR2++(IR1) 
R7,IR1 
R7,RC 


1,R5 
@LOG_SIZE, R5 
LOOP 
@SOURCE_ADDR, AR1 
1, IRO 

=15.R7 


R3 = X(1I1)-X(12) 
R2 = X(1I1)+X(1I2) 
Rl = R3*SIN 
X(I3) 

R4 = X(1I4) 

RO = R3*COS 


R3 = X(14)-X(1I3) 
R2 = X(13)+X(14) 


X(I1) 4 
R4 = R2*COS 
xX(I2) 4 


R3 = R3*SIN + R2*COS 
Rl = R2*SIN 

X (14) < 
R4 = R3*COS - R2*SIN 


R3 = X(1I1)-X(12) 
R2 = X(1I1)+X(12) 
Rl = R3*SIN 
X(I3) 

R4 = X(T4) 

RO = R3*COS 

R3 = X(14)-X(1I3) 
R2 X (13) +X (14) 
X(I1) 4 
R4 = R2*COS 

X (12) +H 
Get prepared for the next 
R3 = R3*SIN + R2*COS 

Rl = R2*SIN 


x(14) 4 
= R3*COS - R2*SIN 


Loop back to the inner loop 
Dummy 


Next stage if any left 


Double step in sinus table 
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Fast Fourier Transforms (FFTs) 


Example 6-17. Real Inverse Radix-2 FFT (Continued) 


, 

, Part A: 
v ——« 

i ARI 
i 

; 

i AR2 
ys 

i 

c AR3 
v 

i 

i 

; AR3 
; 

Pele 

i ARI 
i 

; 

; 

i 

rs 
LOOP3_A: 


; Perform third FFT loop. 


TA 


I2 


I3 


14 


PPPorrr 
iw) 
H 


0 ¢€ X (Il) 
1 

2 <€ 2  * X(I2) 
3 

4 4€ X (Il) 
5 

6 € -2  * X(I4) 
7 

8 
9 


@SOURCE_ADDR, AR1 
AR1, AR2 

AR1,AR3 

AR1,ARA4 

2, AR2 

4, AR3 

6, AR4 

8, IRO 
@FFT_SIZE,RC 


@SINE_TABLE, ARO 


*AR4,R2 
RO, *AR1++ (IRO) 
=2..0,R2 
*AR2,R3 
R1, *AR3++ (IRO) 


R3, *AR2++ (IRO) 
R2, *AR4++ (IRO) 


+ X(I3) 


X(I3) 


7 ARO points at SIN/COS table. 


RO 
R1 


xP (CI) KY (13) 
MP (LA) = KX C13) 


xX’ (11) 4 
R2 = -2*xX’ (14) 


x’ (13) 4 
R3 = 2*X’ (12) 


Xx’ (12) <——— 
xX’ (14) <—— 
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Example 6—17. Real Inverse Radix-2 FFT (Continued) 


Fast Fourier Transforms (FFTs) 


TT i Te i ee nT? 


vvevwpegyeosy>$ 


oma uF WNEF OO 


X(I1) 


X (11) 


A A A A 


NOTE: 


+ X(I2) 


- X(I3) 


[X(I1)- X(I2)]*COS- [X(I3)+ X(1I4)]*SIN 


[X(I1)- X(I2)]*SIN+ [X(1I3)+ X(14)]*COS] 


COS (2*pi/8) = SIN(2*pi/8) 


@SOURCE_ADDR, AR1 


R1, AR2 
R1,AR3 
R1,AR4 
1,AR1 

3, AR2 
5,AR3 
7,BR4 
@SINE_TABL 
@FFT_SIZE, 
=3,, RC 

RC, IR1 
2,RC 


E, ART 
RC 


; ART points at SIN/COS table. 
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Example 6—17. Real Inverse Radix-2 FFT (Continued) 


LOOP3_B: 


TE 


ab oy 


RPTB 


TE 


SI 


*AR2, 
*AR3, 
R6, *A 
R6, *A 
RO,R4 
RO,R4 
RO, *A 
R5,*A 
R2,*A 
R1,*A 


*AR4, 
R2,*A 
R1,*A 


*AR2, 
RO, *A 
R6, *A 
*AR3, 
R6, *A 
RO,R4 
RO,R4 
RO, *A 
R5,*A 
R2,*A 
R1,*A 
R5,*A 
*AR4, 
R2,*A 
R1,*A 


RO, *A 


R6 

RO 

R1,R5 
R1,R4 

,R3 

¢R2 

R4,R1 
R1++(IRO) 
R4,R5 
R2++ (IRO) 


R5,*++AR7(IR1),R1 


R3,R2 
R7,RO 
R4++(IRO) 


LOOP 3_B 


R6 

R3++ (IRO) 
R1,R5 

RO 
R1,R4 
,R3 
,R2 
R4,R1 
R1++(IRO) 
R4,R5 
R2++(IRO) 
R7,R1 
R3,R2 
R7,RO 
R4++ (IRO) 


R3 


R6 = 
RO = 
R5 = 
R4 = 


X (12) 
X(I3) 
X (11) +X (12) 
X(I1)-X (12) 


R3 


R2 = 
R1 = 
X(I1) 
R5 = 
X (12) 
R1 = 
R2 
RO = 
X (14) 


R6 = 
X (13) 
R5 = 
RO = 
R4 = 
R3 = 
R2 = 
Rl = 
X(I1) 
R5 = 
X (12) 
Rl = 


X (I1) -X (12) -X (13) 
X (I1) -X (12) +X (13) 


X (14) -X (13) 
< 
X(I1)-X (12) +X (13) +X (14) 


< 


R5S*SIN 
X (11) -X (12) -X (13) -X (14) 
R2*SIN |= =—— 

< 


X (12) 
< 

X (11) +X (12) 

X(I3) 

X(I1)-X (12) 

X(I1)-X (12) -X (13) 

X(I1)-X (12) +X (I3) 


X (14) -X (13) 


< 
X(I1)-X (12) +X (13) +X (14) 
< 
RS*SIN € 


R2 


RO = 
X (4) 


X (13) 


X (11) -X (12) -X (13) -X (14) 
R2*SIN 
< 
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Example 6—17. Real Inverse Radix-2 FFT (Continued) 


, 
; Perform first and second FFT loops. 
f 
; ARlL > Il 0 @ X(I1) + X(T3) + 2*xX(I2) 
; AR2 > I2 1 @ xX(I1) + X(I3) - 2*xX(I2) 
; AR3 > 13 2 @ X(I1) - X(13) - 2*x(T4) 
i | AR4 > r4 3 @ X(I1) - X(T3) + 2*xX(T4) 
i ARL > 4 
1 
’ 
| 

LDI @SOURCE_ADDR, AR1 

LDI AR1,AR2 

LDI AR1,AR3 

LDI AR1,AR4 

ADDI 1,AR2 

ADDI 2,AR3 

ADDI 3,AR4 

LDI 4,IR0 

LDI @FFT_SIZE, RC 

LSH -2,RC 

SUBI 2;.RC 
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Example 6-17. Real Inverse Radix-2 FFT (Continued) 


LDF *AR4,R6 ; R6 = X(I4) 
LDF *AR2,R7 ; R7 = X(I2) 

| | LDF *AR1,R1 ; R1 = X(I1) 
MPYF 2.0,R6 ; R6 = 2 * X(I4) 
MPYF 2.0,R7 ; R7 = 2 * X(I2) 
SUBF3 R6,*AR3,R5 ; R5 = X(1I3) - 2*X(T4) 
SUBF3 R5,R1,R4 ; R4 = X(1I1)-X(I3)+2X (14) 
SUBF3 R7, *AR3,R5 ; R5 = X(1I3) - 2*X(I2) 

| | STF R4, *AR4++(IRO) ; x(14) « 
ADDF3 R5,R1,R3 ; R3 = X(I1)+X(I3)-2X (12) 
ADDF3 R6, *AR3,R4 ; R4 = X(13) + 2*X(I4) 

| | STF R3, *AR2++(IRO) ; x(r2) 4 
SUBF3 R4,R1,R4 ; R4 = X(1I1)-X(I3)-2X (14) 
ADDF3 R7, *AR3, RO ; RO = X(I3) + 2*X(I2) 

| | STF R4, *AR3++(IRO) ; x(13) 4 
ADDF3 RO,R1,RO ; RO = X(I1)+X(I3)+2X(1I2) 
RPTB LOOP1_2 ; 
LDF *AR4,R6 ; R6 = X(I4) 

| | STF RO, *AR1++(IRO) ; X(I1) ¢ 
MPYF 2.0,R6 ; R6 = 2 * X(I4) 
LDF *AR2,R7 ; R7 = X(I2) 

| | LDF *AR1,R1 ; R1 = X(I1) 
MPYF 2.0,R7 ; R7 = 2 * X(I2) 
SUBF3 R6, *AR3,R5 ; R5 = X(1I3) - 2*X(T4) 
SUBF3 R5,R1,R4 ; R4 = X(1I1)-X(I3)+2X (14) 
SUBF3 R7,*AR3,R5 ; R5 = X(1I3) - 2*X(I2) 

| | STF R4, *AR4++(IRO) ; x(14) 4 
ADDF3 R5,R1,R3 ; R3 = X(I1)+X(I3)-2X (12) 
ADDF3 R6, *AR3,R4 ; R4 = X(13) + 2*X(I4) 

| | STF R3, *AR2++(IRO) ; X(12) « 
SUBF3 R4,R1,R4 ; R4 = X(I1)-X(I3)-2X (14) 
ADDF3 R7, *AR3, RO ; RO = X(I3) + 2*X(I2) 

| | STF R4, *AR3++ (IRO) ; X(I3) 4 

LOOP1_2: ADDF3 RO,R1,RO ; RO = X(I1)+X(I3)+2X(I2) 

STF RO, *ARL ; LAST X(I1) ¢ 
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Example 6—17. Real Inverse Radix-2 FFT (Continued) 


Fast Fourier Transforms (FFTs) 


Check bit reversing mode 


If SourceAddr <> DestAddr, 


(on or off). 


; BIT_REVERSING = 0, then OFF (no bit reversing). 
; BIT_REVERSING <> 0, then ON. 
r 
LDI @BIT_REVERSE, RO 
CMP I 0,RO 
BZ MOVE_DATA 
r 
; Check bit reversing type. 
r 
; If SourceAddr = DestAddr, then in place bit reversing. 


then standard bit reversing. 


LDI 
CMPI 
BEQ 


Bit reversing type 1 


@SOURCE_ADDR, RO 
@DEST_ADDR, RO 
IN_PLACI 


sy 


(from source to destination). 


; NOTE: abs(SOURCE_ADDR - DEST_ADDR) must be > FFT_SIZE, 
H 

LDI @FFT_SIZE, RO 

SUBI 2,R0 

LDI @FFT_SIZE, IRO 

LSH -1,IRO; IRO = half FFT size. 

LDI @SOURCE_ADDR, ARO 

LDI @DEST_ADDR, AR1 

LDF *ARO++,R1 

RPTS RO 

LDF *ARO++,R1 

|| STF R1, *AR1++(IRO)B 
STF R1, *AR1++(IRO)B 
BR DIVISION 


this is not checked. 


DSP Algorithms 6-73 


Fast Fourier Transforms (FFTs) 


Example 6—17. Real Inverse Radix-2 FFT (Continued) 


rd 
i 
v 
¥ 
H 
IN_PLACE: LDI @FFT_SIZE, IRO 
LSH -2,IR0 ; 
LDI 2,IR1 
LDI @FFT_SIZE,RC 
LSH -2,RC 
SUBI 3,RC 
LDI @DEST_ADDR, ARO 
LDIA RO, ARL 
LDIA RO, AR2 
NOP *AR1++(IRO)B 
NOP *AR2++(IRO)B 
LDF *++ARO(IR1),RO 
LDF *AR1,R1 
CMP I AR1, ARO ; 
LDFGT RO,R1 
LDFGT *AR1++(IRO)B,R1 
RPTB BITRV1 
LDF *++ARO(IR1),RO 
|| STF RO, *ARO 
LDF *AR1,R1 
|| STF R1, *AR2++(IRO)B 
CMP I AR1, ARO 
LDFGT RO,R1 
BITRVI: LDFGT *AR1++(IRO)B, RO 
STF RO, *ARO 
STF R1, *AR2 
v 
; 
LDI @FFT_SIZE,RC 
LSH -1,RC 
LDI @DEST_ADDR, ARO 
ADDI RC, ARO 
ADDI 1, ARO 
LDI ARO, AR1 
LDI ARO, AR2 
LSH -1,RC 
SUBI 3,RC 
NOP *AR1++(IRO)B 
NOP *AR2++(IRO)B 
LDF *++ARO(IR1),RO 


In-place bit reversing. 


Bit reversing on even locations, lst half 


only. 


IRO quarter FFT size. 


Xchange locations only if ARO<AR1. 


Perform bit reversing on odd locations, 
2nd half only. 
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BITRV2: 


BITRV3: 


LDFGT 
LDFGT 


*AR1,R1 

AR1,ARO ; 
RO,R1 

*AR1++(IRO)B,R1 


BITRV2 
*++ARO(IR1),RO 
RO, *ARO 

*AR1,R1 

R1, *AR2++(IRO)B 
AR1, ARO 

RO,R1 
*AR1++(IRO)B,RO 


RO, *ARO 
R1,*AR2 


Y 
i 
@FFT_SIZE,RC 
-1,RC 
RC, IRO 
@DEST_ADDR, ARO 
ARO, AR1 
1, ARO 
IRO,AR1 
-1,RC 
RC, IRO 
2,RC 


*ARO, RO 
*AR1,R1 


BITRV3 
*++ARO(IR1),RO 
RO, *AR1++(IRO)B 
*AR1,R1 

R1, *-ARO (IR1) 


RO, *AR1 
R1, *ARO 


DIVISION 


, 


a 


, 


Xchange locations only if ARO<AR1. 


Perform bit reversing on odd 
locations, lst half only. 


Check data source locations. 


If SourceAddr = 

DestAddr, then do nothing. 
If SourceAddr <> 

DestAddr, then move data. 
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Example 6-17. Real Inverse Radix-2 FFT (Continued) 


MOVE_DATA: 


PD YP PROX wan 
Oo 
H 


H 
Hy] 


H 
Lea 


DIVISION: 


OP 


Oo 
a] 


MPYF3 


LAST_LOOP: 


@SOURCE_ADDR, RO 
@DEST_ADDR, RO 
IVISION 


@FFT_SIZE, RO 
2,RO0 
@SOURCE_ADDR, ARO 
@DEST_ADDR, AR1 


*ARO++,R1 


RO 
*ARO++,R1 
R1, *AR1++ 


R1, *ARL 


2, IR0 
@FFT_SIZE, RO 


RO ; exp = LOG_SIZE 
RO ; 32 MSB’S saved 
RO 

RO ; Neg exponent 
RO 

RO ; RO = 1/FFT_SIZI 
@DEST_ADDR, AR1 
@DEST_ADDR, AR2 
*AR2++ 

@FFT_SIZE,RC 

-1,RC 

2,RC 

RO, *AR1,R1 
LAST_LOOP 


RO, *AR2,R2 
*AR1++ (IRO) 

RO, *AR1,R1 

R2, *AR2++(IRO) 


RO, *AR2,R2 
R1, *ARL 
R2, *AR2 


fl 


lst location 


2nd, 4th, 6th,... 


3rd,5th; /thy... 2 


Last location 


location 


location 
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Example 6—17. Real Inverse Radix-2 FFT (Continued) 


; Return to C environment. 


a 


POP DP ; Restore C environment variables. 
POP AR7 
POP AR6 
POP AR5 
POP AR4 
POPF R7 
POP R7 
POPF R6 
POP R6 
POP R5 
POP R4 
POP FP 
RETS 

end 


* 

* No more. 

* 

KKKKKKK KKK KKK KKK KKK KKK KK KK KKK KKK KKK KKK KK AK KKK KKK KKK KK KK KKK KK KKK KKK KKK KK KKK KKK 
* 


The 'C3x quickly executes FFT lengths up to 1024 points (complex) or 2048 
(real), covering most applications. It performs this task almost entirely in on- 
chip memory. See Table 6—2 on page 6-79 for the number of CPU clock cycles 
and the execution time required for FFT lengths between 64 and1024 points 
for the four algorithms. 
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6.7 TMS320C3x Benchmarks 


Table 6—1 provides benchmarks for common DSP operations. Table 6-2 sum- 
marizes the FFT execution time required for FFT lengths between 64 and 1024 
points for the algorithms in Example 6-13, Example 6-15, Example 6-16, 
and Example 6-17 beginning on page 6-31. 


The benchmarks are given in clock cycles (the H1 internal processor cycle). 
To get the benchmark (time), multiply the number of cycles by the processor’s 
internal clock period. For example, for a 60 MHz ’C3x, multiply by 33 ns. 


Table 6—1. TMS320C3x Application Benchmarks 


Application Words Cycles 
Inverse of a floating-point number 31 31 
(32-bit precision) 

Square root 38 46 
Double precision integer add/subtract 2 2 
Double precision integer multiply 24 24 
IEEE to ’C3x format conversion (fast) 12 9 
IEEE to ’C3x format conversion (complete) 33 19 
*C3x to IEEE format conversion (fast) 14 10 
’C3x to IEEE format conversion (complete) 24 27 
FIR filter 5 6+N 
IIR filter (one biquad) 7 7 

IIR filter (N >1 biquads) 16 13+6N 
LMS adaptive FIR filter 11 134+3N 
Matrix-vector multiplication 10 2+10K+K (N-1) 
Vector dot product 6 N+4 
Vector maximum 5 2+3N 
Forward LPC lattic filter 11 5+3P 
Inverse LPC lattice filter 9 6+3P 
u-law (A-law) compression 16(18) 16(18) 
u-law (A-law) expansion 13(15) 16(21) 
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Table 6-2. TMS320C3x FFT Timing Benchmarks (Assumes Data On Chip and 


No Bit Reversing) 


Number of 
Points 


64 

128 
256 
512 


1024 


512 
1024 
2048 


4096 


Radix-2 
(Complex) 


1481 
3445 
7865 


17 709 
17 709 ('C31) 
42 210 ('C32) 


39 600 (’C30) 
40 100 (’C31) 
94 519 (’C32) 


25 688 (’C32) 
64 781 (’C32) 
11 611 (’C30) 


117 400 (’C31) 


280 800 (’C30) 
283 600 (’C31) 


Radix-4 
(Complex) 


2050 


10400 


50 670 


Number of CPU Clock Cycles 


Radix—2 
(Real) 


791 

1746 
3925 
8840 


19 820 


Radix-2 
(Real Inverse) 


1064 
2369 
5282 
11731 


25 900 


These benchmarks include C overhead: they represent the number of cycles 


between the standard C-compiler _main and _ exit labels. 


These benchmarks do not include the final bit-reversing stage. If bit-reversing 
is required, it is implemented in a serial fashion in off-chip memory. 
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6.8 Sliding FFT 


SFFT.ASM uses a technique known as a sliding FFT (SFFT) to calculate the 
spectrum of a signal on a sample-by-sample basis. The SFFT is particularly 
well-suited for applications where signal analysis, filtering, modulation, 
demodulation, or other forms of signal manipulation in the frequency domain 
must be performed in real time. The SFFT algorithm is similar to the discrete 
Fourier transform (DFT). The SFFT is equivalent to overlapped FFTs with an 
overlap of 1 sample, in that the past frequency data is reused to calculate the 
frequency spectra of the next sample window. The calculation is performed by 
adding the frequency domain spectra of a new sample, while simultaneously 
subtracting the frequency domain spectra of the oldest sample. The SFFT 
does not require first-hand knowledge of the DFT or FFT. In addition, the SFFT 
can be used to derive the DFT equation, which can be used by DSP beginners 
or by DSP experts looking for a different approach to solve a problem. 


6.8.1 SFFT Theory: A Better Way to Use the Impulse Response 
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The SFFT is based on the following simple concepts: 


1) The property of superposition allows two or more signals to be added lin- 
early to create a new signal. Asampled time domain signal is the summa- 
tion of a series of individual input samples or impulses of varying magni- 
tude (Figure 6—10a). Similarly, signals, or impulses, can be subtracted. 


If an input signal sample buffer (Figure 6—10a) of datais kept in memory, a 
sliding rectangular window of data samples (Figure 6—-10b and 
Figure 6—10d) can be constructed by adding the newest sample and 
subtracting the oldest sample (Figure 6—10c) from the previous original 
windowed signal (Figure 6—10b). The following diagram shows how the 
addition and subtraction of samples can ’slide’ a window of data samples 
from those shown in Figure 6—10b to those shown in Figure 6—10d. 
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Figure 6-10. Input Signal Sample Buffer 
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d) Next windowed signal 


> Window is time-shifted 1 sample 


Note: T=time 


2) The frequency domain response of an impulse, or single sample point 
where all other data points are zero, results in a flat frequency response 
with a magnitude in each frequency bin equal to the impulse input magni- 
tude. Conversely, the impulse is the additive result of many sinusoidal fre- 
quency components. The time when the impulse occurs within the sample 
window is determined by the phase angles of the individual component 
frequencies. An impulse’s time of arrival is determined by a linear phase 
shift between each frequency bin. 
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3) In the frequency domain, the addition of frequency samples also follows 
the rules of superposition. 


The spectra of Figure 6—10c, the new—old sample window, is added to the 
spectra of Figure 6—10b, the original windowed signal, to create the new 
spectra of Figure 6—10d. The difference is that complex data is used inthe 
frequency domain to represent the phase information of the individual 
component frequencies. 


4) The summation of a series of simple impulse transforms, which have cor- 
respondingly simple frequency domain transforms, results in the compos- 
ite frequency domain transform of the signal. 


5) Asliding rectangular window is created by subtracting the Nth oldest sam- 
ple, which, in the frequency domain, will have gone through a multiple of 
2 X pi radian rotations. 


se sss ss se SS ACTS, | 
Note: 


In some applications, complex time domain inputs may also useful. For this 
application, only the REAL data from an ADC is used. 


| 


6.8.2 Frequency Response Calculation 
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If an impulse sample occurs at T = 0, the frequency response calculation is fur- 
ther simplified since the response contains only REAL and no IMAG compo- 
nents. The transform of an impulse at T = 0 is simply to store the magnitude 
of the impulse into each REAL bin, and zero the IMAG bin. 


If T != 0, the time shift creates a phase shift or complex vector rotation within 
each frequency bin. The phase rotation angle is proportional to the time shift 
and the frequency of interest. 


If the time shift is one sample period, as used in the SFFT, special conditions 
can be applied. At low frequencies, the amount of phase shift from sample to 
sample is low, or in the case of 0 Hz, zero radians of phase. At higher frequen- 
cies, the phase rotation is greatest. At the Nyquist frequency, the vector rota- 
tion is pi/2 radians per sample, which corresponds to 2 samples per sine wave 
cycle. Vector rotation for bins between DC and the Nyquist rate are proportion- 
al to the bin frequency. 
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A Fourier transform also produces both negative and positive frequencies, 
which are mirror images of each other. Only positive frequencies need to be 
computed. This is suitable for spectrum analysis and filtering. The ranges for 
nand the resulting complex rotation vectors (twiddle factors) for each bin are: 


Positive frequencies 0 <=n < N/2 
Negative frequencies -N/2 <= n < 0 
complex (R_phase,I_phase) = exp j*2*pi*n/N 
REAL_tw[n] = cos (n*2*pi/N) 

IMAG_tw[n] = sin(n*2*pi/N) 


The basic SFFT operation is a vector rotate of each previous bin value; that 
is, add the newest sample and subtract the oldest sample. Although it is a sim- 
ple operation, all bins must be computed before the next input sample is ready. 
NewBinVal = (New - Old) + (OldBinval * vect_rotate) 

Bin[n] = (Sample[0]-Sample[N-1]) + (Bin[n] * exp7j*2*pi*n/N) 


6.8.3 Visualizing the SFFT 


The easiest way to visualize the SFFT is to consider that each new sample 
occurs at T = 0, making each new sample all REAL in the frequency domain. 
Then, since the past summation is time-shifted by one sample, a vector rota- 
tion proportional to the frequency is applied. A schematic representation for 
an SFFT bin is shown in Figure 6-11. 


Figure 6—11.Frequency Bin Diagram (Equivalent to an IIR Filter) 
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Where: Vector_rotation_rate[n—th Freq] = 2*PI * n/ (N*Fs) 
K1 & K2 force convergence (see section 6.8.4) 
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6.8.4 Fbin Convergence and Stability 


One aspect of the SFFT is that there is a feedback loop which affects the stabil- 
ity of the bin values. This is similar to an IIR filter where, in the Z domain, a pole 
sites on the unit circle. To maintain stability and keep the bin values from grow- 
ing out of control, the magnitude of the complex vector rotation twiddles must 
be set to slightly less than 1, placing the pole inside the unit circle. This causes 
the impulse energy magnitude in each bin to decay exponentially towards 
zero. By adding a stability factor, by Nth bin rotation an impulse decays to K1N 
of its original magnitude. To subtract the Nth oldest sample, the Nth oldest 
sample is scaled by a second coefficient K2 = K1N. A side effect of the expo- 
nential decay is that the SFFT is now windowed by an exponentially decaying 
window. To minimize this effect, keep K1 close to 1.000 (0.999, for example). 


6.8.5 SFFT Windowing 
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Unlike the FFT and DFT, SFFT windowing cannot be performed in the time do- 
main; the input window is moving in time and, therefore, the window function 
must also move in time. The SFFT windowing operation is performed in the 
frequency domain using a technique known as convolution. The desirable 
effect of windowing is a multiplicative process in the time domain whereby the 
sharp discontinuities at the endpoints, that accompany a rectangular data win- 
dow, are smoothed out. Without a smoothing window, these abrupt changes 
smear the frequency spectrum over many bins. In the frequency domain, the 
coefficients of most windowing functions are simple and do not require large 
storage arrays. For the raised cosine window function, the coefficients are par- 
ticularly simple (—.5, +1.0, —.5) and are easily imbedded into the code as addi- 
tion and subtraction. However, frequency domain (or convolutional window 
filtering) is applied to the REAL and IMAG data separately before the REAL/ 
IMAG data is combined into a magnitude. The operation is fast and only occurs 
during output. Furthermore, other window functions are rapidly and easily 
implemented by selecting different convolution coefficients. 
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Figure 6—12. Raised Cosine Window 
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6.8.6 Using SFFT.ASM for Spectrum Analysis 


If the SPECT_EN variable is set to 1 (true), the DSK analog output is config- 
ured to be the computed spectrum of the analog input beginning at 
BIN_START and ending at BIN_END. The output is then viewed using an oscil- 
loscope, which is triggered on a positive synch pulse. The DAC output voltage 
is proportional to the log magnitude of each frequency bin. 


To help pass impulses with minimal magnitude errors, each DAC output sam- 
ple can be repeated up to DAC_FPT times. Also, the AIC TA register value can 
be programmed to have avery high pass band. This increases the DAC output 
distortion, which is a problem if used for audio applications, but is acceptable 
for visual purposes. 


Also, the BIN_START and BIN_END values do not need to begin at zero or end 
at SFFTSIZE/2. This can be used to show that the frequency bins repeat in the 
frequency domain, as predicted by the discrete Fourier transform. The only 
restrictions are the availability memory and CPU processing power. 


6.8.7 Using SFFT.ASM for Hilbert Transforms and Arbitrary Phase Angles Filters 


If SPECT_EN is set to 0, the output is configured to be the summation of the 
reconstructed REAL and IMAG components. 


An arbitrary output phase angle is implemented by performing a complex mul- 
tiplication of the REAL and IMAG components by a complex vector determined 
by the ANGLE parameter. If ANGLE = 90°, the Hilbert transform is recon- 
structed from the pass-band SFFT bins covering BIN_START to BIN_END. If 
ANGLE = 0.0, no phase shift occurs. 
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The 0° and matched 90° phase shift Hilbert transform is useful in telecommu- 
nications applications, where the quadrature outputs are used to shift the 
spectrum of a signal or in radio and modem modulation schemes. 


6.8.8 Raised Cosine Windowed Filters 


By applying the raised cosine window to the summation of bin values, the 
REAL or IMAG filter response ripple is improved. 


The method implemented uses a series of coefficients that are applied to each 
frequency bin and then added much like an FIR filter, except in the frequency 
domain. 


The coefficient values result from both: 


[1 The convolution of the response of a raised cosine function with the signal 
response 


Lj The multiplication of a rectangular bandpass filter, also applied in the 
frequency domain 


A group delay, or time shift, is also seen which is equal to N/2 plus the time it 
takes a signal to make it through the ADC/DAC conversion process. 


In Figure 6-13 through Figure 6—16, the number of bins required is actually 
WIDTH + 2 for a given pass-band bandwidth and the signs of the coefficients 
alternate (+, —, +, —). The endpoints, which are also scaled by 50%, are the 
result of the window coefficients and define the edge characteristics of the 
filter. 


Figure 6—13. Raised Cosine Window Function (Length = 1 Bin) 
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Figure 6—14. Raised Cosine Window Function (Length = 2 Bins) 
1.0 


_| ft. 


Figure 6—15. Raised Cosine Window Function (Length = 3 Bins) 
1.0 1.0 


_l |. 


Figure 6—16. Raised Cosine Window Function (Length = 4 Bins) 
1.0 1.0 


| | ft. 
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6.8.9 Non-Windowed SFFT 


A special case occurs when the SFFT is used to compute the all pass 0’ and 
90’ Hilbert transforms of a non-windowed synchronized signal. Frequency bin 
spreading occurs if the signalis not harmonically related to the sample window. 


For REAL summations, the input is reconstructed by scaling the 0 or DC bin 
by 50%. This scaling compensates for a 2:1 rise in signal level since all bin data 
energy, except for the 0 bin, is split equally between the positive and negative 
frequencies. 


At the 0 bin, there is no IMAG information, since no phase shift is applied to 
that bin. A DC component for an IMAG reconstruction, therefore, does not 
exist. 


Figure 6—17. N/2 SFFT R/I Bins 


6.8.10 Performance 
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> IMAGSUM 


Since the SFFT needs only to compute the bins of interest within the span of 
one time sample, narrow band analysis or filtering is very efficient, even when 
the effective FFT size is very large. If large numbers of bins and/or high sam- 
pling rates are impractical for a single processor, a traditional block style FFT 
or filter may be more practical. 


For example, ina filter application, only a few frequency bins may be required; 
the unused bins are zero since they are not needed for reconstruction. The 
maximum sampling rate (or the number of bins that can be calculated) is 
shown in the following equation. 


Ts(min) = (SFFT_cycles_per_bin * bins + loop_overhead) * nS/cycle Ts(min) 
= (7 * N/2 +52) 40 nS 
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Note: 


The loop overhead value is the time consumed by interrupt routines, data for- 
matting, input, and output. SFFT.ASM is not highly optimized, since it is for 
educational purposes. 


The loop can be optimized by inlining the three major functions—Input, 
SFFT, and Output— to remove 3 calls and 3 returns (or 24 cycles) from the 


loop overhead. 
ee 


6.8.11 Loop Unrolling for High Speed Filtering 


The inner loop of the SFFT consumes 5 computational cycles, but executes 
in6 cycles. The conflict occurs from a data bus bandwidth limitation and results 
from the STF||STF operation immediately preceding a double load of data for 
the MPYFS instruction. 


This null cycle is filled by moving the filter summations within the loop. The 
summation can be done entirely within registers and requires no data path 
access. 


The +1, —1 convolutional filter coefficients for raised cosine windowing can be 
hard coded within the loop by performing subtractions that invert the sum each 
time it goes through the loop. This avoids fetching coefficients from the data 
bus. 


Overall, the forward and reverse SFFT are computed at 6-7 cycles per bin, 
depending on whether both REAL and IMAG outputs are required. The gener- 
al case educational example SFFT.ASM is slightly slower, while SFFT2.ASM 
which is written for filtering. 


6.8.12 Fitting the Code and Data Into Memory 


If the effective desired SFFT/FFT size is 512 points, then only 256 positive fre- 
quencies need to be computed. With R/I twiddle and R/I SFFT data associated 
with each bin, 1024 words of memory are required. In addition, 512 words of 
input buffer data are needed. 


To maximize speed, the inner loop of the SFFT uses dual access on-chip 
memory to access data at the rate of two data moves per CPU cycle. To avoid 
program fetch conflicts, the SFFT code is loaded into the second on-chip 
SRAM block, which also holds the data buffer. 


If off-chip memory is available, excellent performance is achieved by placing 
as much SFFT bin data on-chip as possible. The input window sample buffer 
and code can be external since the main code loop easily fits inside the cache 
and the sample buffer is only accessed twice per SFFT cycle. 
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Note: 


The SFFT only needs to calculate the difference of the input of the most 
recent and the oldest data sample one time. This value is reused for all bin 
calculations and is kept in a register. 


Cd 


If circular or bit-reversed data storage is used, the data and twiddle buffers are 
forced to 2N word boundaries. In addition, the circular addressing registers are 
consumed. Since the overhead of checking and reloading the buffer pointers 
is minimal and allows non-2N sizes, explicit pointer testing is used in 
SFFT.ASM. 


6.8.13 Using This Code With ’C’ 


To use the functions in this code with a high level language such as C, you must 
perform context save and restore operations at the beginning and end of each 
function. 


6.8.14 TLC32040 ADC and DAC Considerations 


The application file SFFT.ASM is written to use a TLC32040 analog interface 
chip (AIC) connected as used in a TMS320C31 DSP Starter Kit or DSK 
(TMDS3200031). Further documentation for the DSK is available in the DSK 
or by downloading from the Texas Instruments FTP site. 


Files Location 
Main TMS320 FTP mirror site ftp://ftp.ti.com/mirrors/tms320bbs 
Cx DSK files subdirectory ftp://ftp.ti.com/mirrors/tms320bbs/c3xdskfiles 


6.8.15 SFFT Summary 


6-90 


_j A time signal is comprised of a series of samples. 


Each sample is an impulse. 


i 
1 The time signal is a time summation of a series of impulses. 
J 


The frequency spectra of a single impulse at T = 0 is trivial to calculate, 
since itis only aREAL component in each frequency bin whose magnitude 
is that of the impulse. 


(1 The frequency spectra of a signal is the summation of the individual im- 
pulse responses. 
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_j Ashiftin time is a shiftin phase (or phase rotate) in the frequency domain. 


1 Consider each new impulse as occurring at T = 0 and perform the time shift 
on the past summation of samples as a whole. 


_j At each bin, the amount of phase rotation or twiddle factor that is applied 
to each bin is proportional to the frequency of the bin. The phase shift is 
zero at DC (n = 0) and pi radians at Fnyq (n = N/2). 


(1 After phase rotating each bin, simply add the new sample/impulse value. 
(Don’t forget to start with each bin magnitude as zero.) 


(j At this point, the Fourier transform is a forever expanding series in both 
the time and frequency domains. 


[1 The Nth oldest sample is rotated n multiples of 2 x pi radians, making the 
Nth oldest sample completely REAL with no IMAG component. 


1 AtN samples of age, phase rotation=N x (n x 2 x pi/N) =n x 2 x pi. 


Lj Assliding rectangular window is created by subtracting the T = Nth oldest 
sample while adding the newest T = 0 sample. At T = N, each frequency 
bin has rotated N times and is back to 0 radians of phase and can be prop- 
erly subtracted. 


6.8.16 SFFT Algorithm 


SFFT.ASM (Example 6—18 on page 6-94) is written for the DSP beginner, but 
contains features that also make it useful to the experienced DSP program- 
mer. SFFT.ASM implements a continuous time Fourier transform which can 
be used to construct filters and analyze spectra. It can also be used as a gener- 
al-purpose DSP teaching platform. 


SFFT.ASM uses a technique known as a sliding FFT (SFFT) to efficiently cal- 
culate the spectrum of a signal on asample-by-sample basis. The SFFT is par- 
ticularly well-suited for applications where signal analysis, filtering, modula- 
tion, demodulation, or other forms of signal manipulation in the frequency 
domain must be performed in real time. The SFFT algorithm is similar to the 
DFT. 


Further reading and other information includes: 
Lj) Designer Notebook page 22 ’Fast Logrithms on a Floating Point Device’ 


() APPHELP1.TXT and APPHELP2.TXT included with the DSK software 
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(J Texas Instruments’ FTP site: 


Files Location 

Main TMS320 FTP mirror site ftp://ftp.ti.com/mirrors/tms320bbs 

C3x DSK files subdirectory ftp://ftp.ti.com/mirrors/tms320bbs/c3xdskfiles 
TMS320C3x code examples ftp://ftp.ti.com/mirrors/tms320bbs/c3xfiles 
TMS320C4x code examples ftp://ftp.ti.com/mirrors/tms320bbs/c4xfiles 


The following section sets the SFFT parameters which determine the SFFT 
output characteristics. The following rules apply: 


() BIN_LEN = BIN_END - BIN-START > 0 
1) ((SFFTBINS x 4) + SFFTSIZE) < Free data space 


1 Sampling period < time to compute all bins 


Be careful not to set the sampling rate too high while calculating many bin 
values. The SFFT must finish calculating all of its bin values within the time 
span of one sample. 


The effective Fourier series size is determined by the size of the time window 
of samples. Although this does not affect the calculation rate, it does consume 
internal memory. 


Creating a pass band around a particular signal is easy, since the signal can 
be viewed either in frequency or time by changing the setting of SPECT_EN. 
With practice, you can you can zoom in on particular segments of frequency 
by changing the start and stop bins, window size, and sampling rate. 


The DAC output signal fidelity is largely determined by the TA register value 
that is programmed into the AIC. No one value seems to fit all applications. 
However, the following rules generally apply. If TA is small, the DAC recon- 
struction filter is clocked at a faster rate. This pushes the upper pass-band limit 
higher in frequency, resulting in faster slew times. This is desireable for a spec- 
trum analyzer output where fast impulse response to frequency peaks are 
needed for suitable viewing. For audio applications, a larger TA value is 
desired, since the overclocking of the DAC reconstruction filter results in signif- 
icant distortions. 


The AIC master clock input is derived from the timer output pin of internal timer 
O. If the timer reference is set higher than the TLC32040 maximum clock rate 
of 10 MHz, additional distortion occurs. 


A TLC32040 analog interface circuit is used on the DSK since it responds 
favorably when used beyond its tested limits. However, predicting perfor- 
mance depends on many factors; experimentation may be required. 
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AIC setup registers are programmed into the AIC using a data word which is 
tagged with xxxx11b in the bottom 2 LSBs to signal the AIC to accept a secon- 
dary transmit (or register program) word. 


The DAC switch cap filter rate high is set by the TA divisor. A low TA value, used 
to overclock the DAC reconstruction filter, trades signal fidelity for faster 
impulse response times. 


This application was designed and tested using a 50 MHz TMS320C31 DSP 
Starter Kit (TMDS3200031) which includes a TLC32040 14-bit ADC/DAC. 
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Example 6-18. SFFT.ASM 


; SFFT2.ASM 
; Keith Larso 


; liabilities 


n 


; TMS320 DSP Applications 
7 (C) Copyright 1996,1997,1998 
; Texas Instruments Incorporated 


; This is unsupported freeware with no implied warranties or 


See the C3x DSK disclaimer document for details 


; SPECT_EN 


i Fs 
; Hz/bin 
. Range 


; Erequencies 


; Default setup 


20. 
40. 


8 khz (4.8 uS) 
WF ANZ 


Led KAZ = 3.9 Khaz 


, If this file is re-assembled with SPECT_EN set to 0, this will give a 
; bandpass filter from 1.3 - 3.9 Khz having 90 degrees phase shift at all 


SFFTSIZE .set 512 ; Sample Window length (FFT size) 
BIN_START .set 32 ; Start computing SFFT at this bin 
BIN_END .set 96 ; End computing SFFT at this bin 

v 

ANGLE set 90.0 ; Filter reconstruction angle (degrees) 
v 

SPECT_EN set 1 ; Enable spectrum analyzer output 

RATE set 2 ; Write display points RATE times each 
¥ 

TIMO_prd set 2 ; AIC reference clock is TIMO 

TA .set 6 ; DAC setup 

TB set 25 ; 

RA set 10 ; ADC setup 

RB set 15 ; 

v 

; PARAMETERS BELOW THIS LINE ARE COMPUTED FROM THE INFORMATION 

; ABOVE. THERE IS NO NEED TO MODIFY ANYTHING BELOW THIS POINT 

v 

BIN_LEN .set BIN_END-BIN_START ; Filter length in bins 

SFFTBINS -set BIN_LEN+1 ; 

N -set SFFTSIZE ; 'N’ used as shorthand for SFFTSIZE 
TR .set 0 ; Real twiddle offset in each cell 

Tr -set i ; Imag 

DR .set 0 ; Real data offset in each cell 

DI -set Hl ; Imag 

RIBINSIZE .set 2 ; Size of R/I element pair 

pi .set 3.14159265 ; Useful in making apple pie 

Ww set 2.0*pi/N ; angle = F * 2*pi/Fs 

OVM -set 0x80 ; Use overflow mode to saturate results 
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; If the input parameters won’t work, generate a descriptive error 
; for the user letting them know what to look for and maybe fix 


r 


.if (BIN_LEN < 1) 


APP MESSAGE: Calculated BIN_LEN must be >1 
-endif 
sift ((SFFTBINS*4) + SFFTSIZE) > (0xE40-0x800) 
APP MESSAGE: The Fbin and data storage buffers are too big for the DSK 
-endif 
r r 
; The SFFT twiddles, data, and input buffer arrays are allocated p 


; to be placed into RAMO to avoid bus conflicts with program fetching; 


, 


yj; are used in various routines. 


-include ”C3XMMRS.ASM” H 
-start “DATA”, 0x809800 ; Data arrays are placed at start of RAMO 
-sect “DATA” : 

TWIDCOEF jo-o---------------------- ; 

n set BIN_START H 
- Loop SFFTBINS ; R/I phase or twiddle coefficients 
- float K1*cos (n*w) ; 
- float K1*sin (n*w) "i 

n .sdef n+1.0 ; next ‘n’ 
-endloop ; 

SFFTDATA ; - ----- --------; 
. Loop SFFTBINS ; R/I frequency bin data 
- float 0,0 ; Pre-Zeroing bin data removes 
-endloop 7 startup glitches 

BUF oor ---- ---- ----; 
- Loop N/2 , N samples of ADC input delay data 
- float 0,0 7 
-endloop ; 

Ul 

; The application code begins here, beginning with constants that 7 


r 


i 
Tbase 


.-word TWIDCOEF 
Bbase -word SFFTDATA 
CircAddr .word BUF 
BUFSTART .word BUF 
BUFEND word BUF+N 
OutBin -float 0 
MAX float 32000.0 
A_REG word (TA<<9) + (RA<<2) +0 
B_REG word (TB<<9) + (RB<<2) +2 
C_REG word 00000011b 
7Ogctrl word 0x0E970300 
SOgctrl word 0x0E973300 
SOxctrl word 0x00000111 


v 
Location of twiddle coefficients 
Location of R/I SFFT Bin data 
Current pointer into sample data 
Start address of sample data 
End address of sample data 
Current spectrum analyzer bin 
Used synch pulse and scaling 


Packed AIC register values 


noninverted clkx/clkr 
inverted clkx/clkr 


Sport setup, 
Sport setup, 
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Example 6-18. SFFT.ASM (Continued) 


SOrctrl 
NewMnsOld 
K1 

K2 
FILTEROUT 
Scale 
REAL VEC 
IMAG_VEC 
FLOG2SC 
bigval 


.word 
.word 
set 

-float 
- float 
-float 
-float 
-float 
-float 
.word 


0x00000111 ; 
0 i 
0.99995 7 
pow (K1,N) ; 
0.0 ; 
4.0/N i 


—cos (pi*ANGLE/180.0); 
—sin(pi*ANGLE/180.0); 
pow (2.0,-24.0) ; 
0x00010000 ; 


Use a value slightly less than 1.0 
K1*N oldest sample scale factor 
Temp storage for SFFT filter output 
SFFT growth scale factor 

filtered REAL scale factor 
filtered IMAG scale factor 

Scale factor for log2 calculations 
Used in overflow mode saturation 


7 and oscilloscope to monitor t 


; The main loop consists of waiting for 
; When an receive interrupt occurs, 
; Gata delay line buffer, 


the 


a new ADC sample. ; 
new data is loaded into the ; 


followed by the SFFT and output routines. ; 
; Four dummy writes to the external bus have been added in the main ; 
; loop to allow real time benchmarking of the three functions using j; 
he address bus LSB’s ; 


v 
Start in last 512 words of RAMO 


start “CODE”, 0x809E40 ; 
.sect “CODE” ; (also includes DSK kernel) 
main ldi OxE4, IE ; Enable XINT/RINT/INT2 
idle , Wait for Receive Interrupt 
v 
ldi @SO_rdata, RO ; The first interrupt occurs shortly 
ldi 0,RO ; after AIC init is complete, which 
sti RO,@SO_xdata ; will not leave enough time for SFFT 
a 
loop idle ; Wait for Receive Interrupt 
std RO, @O0x80A000 <1 
call Input ; Put ADC sample in delay buffer 
sti RO, @Ox80AF03 3<2 
call SFFT ; Calculate SFFT 
Sti RO, @Ox80AFOF 3<3 
call Output : Output result 
sti RO, @Ox80AF3F 7<4 
b loop 7 Loop back and do forever 
. ra 
; The ADC data is read and buffered here ; 
Lf v 
Input ldi @SO_rdata, RO , get ADC data 
ash -16,R0 ; Sign extend previous sample in MSB’s 
float RO, RO ; Convert the ADC data to float 
hea. @CircAddr, ARO ; Load present circ buf address 
ldt *ARO,R7 ; Multiply by ’K2’ for bin stability 
mpyf @K2,R7 : (see text) 
stf RO, *ARO++ : 
cmpi @BUFEND, ARO ; If at end of buffer, point to start 
ldige @BUFSTART, ARO , 
subrf RO,R7 ; R7 = X[-N] - X[0] 
sti ARO, @CircAddr ; save new ‘circular’ modified ptr 
stf R7, @NewMnsOld : 
rets ; 
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Example 6-18. SFFT.ASM (Continued) 


, 7 


; The forward and reverse SFFT are calculated within this one loop 7 
; The loop itself is unrolled to achieve an inner loop cycle count : 
, of 7 cycles per bin calculation. The inner loop contains both the ; 
; REAL and IMAG filter summations, so if the output is for spectrum ; 
; analysis or only one filter sum is required, one or both summations; 
; can be removed giving an inner loop speed of 6 cycles/bin - 
v Ul 
SFFT ldi @Tbase, ARO ; R/I twiddle ptr 
ldi @Bbase, AR1 ; R/I SFFT array ptr 
ldi @Bbase, AR2 , SFFT output (usualy in place) 
ldi SFFTBINS-1,RC ; Number of bins to calculate 
ldi RIBINSIZE, IRO ; Size of R/I pair in array 
ldf @NewMnsOld, R7 ; R7 = (New — K2*Old) 
v 
ldf 0,R4 ; Zero the REAL filter sum 
ldf 0,R5 ; Zero the IMAG filter sum 
v 
mpyf3 *+ARO (TR), *+AR1 (DR) ,RO ; TR*DR <- unroll from main loop 
rptb EndSFFT 7 
iv 
Loop mpyf3 *+ARO(TR) ,*+AR1(DI) ,R1 ; TR*DI 
mpyf3 *+ARO(TI) ,*+AR1(DI) ,RO ; TI*DI 
|| addf3 R7,RO ,R3 ; (TR*DR + DELTA) 
mpyf3 *+ARO(TI) ,*+AR1(DR) ,RO ; TI*DR 
|| subf3 RO,R3 ,R3 ; TR*DR - TI*DI + DELTA 
mpyf3 *++ARO(IRO),*++AR1(IRO),RO ; TR*DR (used in next loop) 
|| addf3 R1,RO ,R2 ; TR*DI + TI*DR 
stf R2, *+AR2 (DI) ; Save the new Fbin values 
|| stf R3,*AR2++(IRO) ; 
iv: 
subf3 R4,R3,R4 ;REAL sum; sum’=R-sum alternates sign of 


EndSFFT subf3 R5,R2,R5 ; IMAG sum; raised cosine window coeficients 


; For raised cosine window filters the endpoint bin values 
; are scaled to 1/2 relative to the pass bins 


addf R4,R4 ; Double inner +/-1 sum loop 
addf R5,R5 ; 
subf R3,R4 ; Subtract endpoints at 50% 
subf R2,R5 * 
ldi @Bbase, AR1 ; ptr to start of R/I SFFT array 
ldf *+AR1 (DI) ,R2 - 
|| laf *+AR1 (DR) ,R3 ; 
okt SFFTBINSé1 ; If the loop count was odd, the 
mpyf -1,R4 ; +,-,+,- sum result is negative 
mpyf -1,R5 4 
endif ; 
addf R3,R4 : 
addf R2,R5 7 
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r 
r 


r 


; When the SFFT is finished, the REAL/IMAG sums are scaled 


7 accordingly for the desired output phase angle. A ‘’growth’ 


; scale factor is also applied since the summation occurs 
; over N data points. 


ExXitSFFT mpyf @REAL_VEC,R4 ; Rotate to desired output phase 
mpyf @IMAG_VEC,R5 ; 
addf3 R4,R5,RO ; Sum the R/I into a REAL output 
mpyf @Scale, RO ; inverse of N/2 growth 
stf RO, @FILTEROUT ; 
rets i 


The output section is written for both Spectrum analyzer output 
as well as REAL/IMAG filter sum outputs 


ra 
, 


r 


a 


v 
Output: Pei SPECT_EN=0 ; If SPECT_EN=0 (disable) output either 

ldf @FILTEROUT, RO ; Output REAL/IMAG bin sum 
-else ? 
v 
; The Spectrum analyzer output section is bypassed 
; if the spectrum analyzer is not enabled 
v 
Lat @OutBin, RO , Point to next output bin 
addft 1.0/RATE, RO 7 increment analyzer output pointer 
cmpf BIN_LEN, RO ; 
ldfge 0,RO : 
stf RO, @OutBin ; 
fi5g RO, RO ; 
bzd Out ; 
mpyi RIBINSIZE, RO ; Fbins are 2 words (R/I) per bin 
ldfz @MAX, RO , If at base Fbin O Hz, output a synch 
ldi @Bbase, ARO ; 
subi 2,AR0O 7 point to output bin-1 to perform 
addi RO, ARO ; =.5,1.0,-.5 convolutional window 
: 
Let *+ARO (DI+0),RO ; Perform convolutional window filter 

|| 1df *+ARO(DR+0),R2 j; on the R/I pairs for this output 
addf *+ARO (DI+4) ,RO ; 
addf *+ARO (DR+4) ,R2 ; 
mpyf -0.5,RO ; Scaling coefficient for -1,+1 bins 
mpyf =0135, R2 ; 
addf *+ARO (DI+2),RO ; 
addf *+ARO (DR+2),R2 ; 
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mpyf RO, RO ; Calculate REAL*2 + IMAG*2 magnitude 
mpyf R2,R2 : 
addf R2,RO : 
call FLOG2 7 Convert to log2(), then scale 
mpyf 32,R0 , and shift for best display 
mpyf 32,R0 ; 
subf @MAX, RO ; 
iy Gey os — 
endif ; 

Out fix RO, RO ; Convert to integer DAC output 
mpyi @bigval,RO ; Use Overflow mode ALU saturation 
ash =16,R0 ; 
andn 3,RO ; Do not request a 2nd xmit 
sti RO, @SO_xdata ; Output DAC value to serial port 
réts ; 

r r 

; FLOG2() Ultra Fast LOG2 function ; 


; computes log2(RO) and returns e8/sl/m4 accuracy float value in RO ; 


a 


, 
FLOG2: cmpf 0.0,RO ; Exit if value is <= Zero 


ldfle -1,R0 ; if x<=0 return -1 (error) 

retsle ; return if X<=0 

lsh 1,R0 ; Concatenate mantissa to exponent 

pushf RO ; Convert ’fast log’ to int, then float 

pop RO ; Value is accurate but scaled by 2%24 

float RO, RO ; 

mpyf @FLOG2SC, RO ; Mpy by scale factor 

Féts ; 
r v 
; The startup stub is used during initialization only and can be ; 
; Overwritten by the stack or data after initialization is complete. ; 
; Note: A DSK or RTOS communications kernel may also use the stack. ; 
; In this case be sure to not put the stack here during debug. ; 
ul v 

-entry ST_STUB ; Debugger starts here 
ST_STUB ldp TO_ctrl ; Use kernel data page and stack 

Lea, @stack, SP ; 

ldi 0,RO ; Halt TIMO & TIM1 

Sita RO, @TO_ctrl . 

sti. RO, @TO_count ; Set counts to 0 

ldi TIMO_prd, RO ; Set period 

sti RO, @TO_prd s 

ldi Qx2C1,R0 ; Restart both timers 

sti RO, @TO_ctrl . 
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ldi @SOxctrl,RO 

sti RO, @SO_xctrl transmit control 

ldi @SOrctrl1,RO 

eta. RO, @SO_rctrl receive control 

di 0,RO 

sti RO, @SO_xdata DXR data value 

ldi @SOgctrl,RO Setup serial port 

sti RO, @SO_gctrl global control 
v , 
; This section of code initializes the AIC : 
, r 
AIC_INIT LDI 0x10,1E ; Enable only XINT interrupt 

andn 0x34,1F ; 

ldi 0,RO ; 

sti RO, @SO_xdata ; 

RPTS 0x040 r 

LDI 2, 10F ; XFO=0 resets AIC 

rpts 0x40 ; 

LDI 6, LOF ; XFO=1 runs AIC 

, 

ldi @C_REG, RO ; Setup control register 

call prog_AIC 7 

aa Oxfffc ,RO ; Program the AIC to be real slow 

call prog_AIC ; 

ldi Oxfffc|2,RO0 : 

call prog_AIC ; 

ldi @B_REG, RO ; Bump up the Fs to final rate 

call prog_AIC 7 (smaller divisors should be sent last) 

ldi @A_REG, RO ; 

call prog_AIC ; 

or OVM, ST ; Use the overflow mode for fast saturate 


b main ; the DRR before going to the main loop 


, 7 
; prog_AIC is used to transmit new timing configurations to the AIC. ; 
; If you single step this routine, the AIC timing will be corrupted j; 


; Causing AIC programming to fail. 7 
; STEP OVER THIS ROUTINE USING THE F10 FUNCTION STEP ; 
¥ Yi 
prog_AIC Ildi @SO_xdata,R1 ; Use original DXR data during 2 ndy 
sti R1,@SO_xdata : 
idle ; 
ldi @SO_xdata,RI1 ; Use original DXR data during 2 ndy 
or 37 R1L ; Request 2 ndy XMIT 
sti R1,@SO_xdata : 
idle : 
Sti. RO, @SO_xdata ; Send register value 
idle ; 
andn 3,R1 ; 
sti. R1,@SO_xdata ; Leave with original safe value in DXR 
, 
ldi @SO_rdata, RO ; Fix receiver underrun by dummy read 
rets 7 
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, , 
; By placing the stack at the end of the users runtime code, the : 
; Maximum space is made available for applications. Essentialy once ; 
; used initialization code or data can be reclaimed after it is used.; 


; However, use this configuration for debug purposes 7 

, r 
.start "STACK", 5S ; This is a reminder to put the stack 
-sect “STACK” ; stack in a safe place. S$ places 

stack -word stack ; section at the current assy address 

, r 

; Install the XINT/RINT ISR branch vectors ; 


, ’ 


-Start "SPOVECTS”,0x809FC5; Place ISR returns directly into 


-sect “SPOVECTS” , secondary branch table 
reti ; XINTO 
reti ; RINTO 
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Programming the DMA Channel 


The direct memory access (DMA) coprocessor is an on-chip peripheral that 
can read from or write to any location in the memory map without interfering 
with the CPU operation. The DMA channel contains its own address genera- 
tors, source and destination registers, and transfer counters. The DMA chan- 
nel can be easily programmed in C or in assembly language. 


The ’C30 and C31 coprocessors each have one DMA channel, while the C32 
coprocessor has two DMA channels. Each channel of the C32 DMA channel 
is similar to those of the ’C30 and ’C31, with the addition of user-configurable 
priorities. 


This chapter provides examples for programming the DMA for the ’C3x. 
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7.2__When a DMA Channel Finishes a Transfer ................2222+55 7-3 
7.3. DMA Assembly Programming Examples .............00eeeeeeees 7-4 


7-1 


Hints for DMA Programming 


7.1. Hints for DMA Programming 


7-2 


The Peripherals chapter of the TMS320C3x User’s Guide describes the DMA 
channel and its operation in detail. Use the following techniques to program 
your DMA more efficiently and to avoid unexpected results: 


_j Resetthe DMA register before starting it. This clears any previously latched 
interrupts that may no longer exist. 


Lj After starting the DMA, set the IE register to enable interrupts for sync 
transfer. 


(1 If aconflict occurs when the CPU and DMA access the memory simulta- 
neously on the ’C30 or C31, the CPU always prevails. Carefully allocate 
the sections of the program in memory for faster execution. If a CPU pro- 
gram access conflicts with a DMA access, enabling the cache helps if the 
program is located in external memory. DMA on-chip access happens dur- 
ing the H3 phase. Refer to the Pipeline Operation chapter in the 
TMS320C3x User’s Guide for details on CPU accesses. 


If a conflict occurs during CPU-DMA access on the ’C32, the priority set 
between the CPU and DMA is used to arbitrate conflicts. If the DMA chan- 
nel has lower priority than the CPU, the DMA may fail to finish a block 
transfer if conflicts occur. To avoid this condition, use CPU/DMA rotating 
priority in the corresponding DMA control register. 


SS —— —— ———_—_—____——$<—_SS ee __=<_  . =. ——™=- =] 


Note: Expansion and Peripheral Buses 


The expansion and peripheral buses on the C30 cannot be accessed simul- 
taneously because they are multiplexed into a common port. Therefore, 
DMA access to the peripheral bus along with CPU access to the expansion 
bus can cause CPU-DMA conflicts. (See the TMS320C3x User’s Guide for 
more information.) 


Lj When you use interrupt synchronization, ensure that interrupts are actual- 
ly generated; otherwise, the DMA will never complete the block transfer. 


_j) Use read/write synchronization when reading from or writing to serial ports 
to guarantee data validity. 


When a DMA Channel Finishes a Transfer 


7.2 When a DMA Channel Finishes a Transfer 


Many applications require that you perform certain tasks after a DMA channel 
has finished a block transfer. The following are indications that the DMA has 
finished a set of transfers: 


a 


The DINT bit in the IIF register is set to 1 (interrupt polling). This re- 
quires that the TCINT bitin the DMA control register be set first. This inter- 
rupt-polling method does not cause any additional conflict during CPU- 
DMA access. 


The transfer counter has a zero value. The transfer counter is decrem- 
ented after the DMA read operation finishes (not after the write operation). 
Nevertheless, a transfer counter with a zero value can be used as an in- 
dication of a transfer completion. 


The STAT bits in the DMA channel control register are set to 009. You 
can poll the DMA channel-control register for this value. However, 
because the DMA registers are memory-mapped into the peripheral bus 
address space, this option can cause further conflicts during CPU-DMA 
access. 
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7.3. DMA Assembly Programming Examples 


Example 7-1, Example 7-2, and Example 7-3 illustrate how to program the 


DMA channel using assembly language. 


When linking the examples, allocate section memory addresses carefully to 
avoid CPU-DMA conflict. In the ‘C30 or C31, the CPU always prevails in cases 
of conflict. If a conflict occurs between a CPU program and DMA data, you can 
enable the cache if the .text section is in external memory. For example, when 
linking the code in Example 7-1, Example 7-2, and Example 7-3, allocate the 
following sections into memory (RAMO corresponds to on-chip RAM block 0 and 


RAM1 corresponds to on-chip RAM block 1): 


Lj .text section into RAMO 
LJ .data section into RAM1 
LJ .bss section into RAM1 


Example 7-1. Array Initialization With DMA 


* 


START 


x 


ay 


* TITLE: 


ARRAY INITIALIZATION WITH 


-GLOBAL START 
.-DATA 


. WORD 
. WORD 
. WORD 
. WORD 
. WORD 
. WORD 


808000H 
OC40H 
0C43H 
ZERO 
_ARRAY 
128 


. FLOAT 0.0 


-BSS 


_ARRAY, 128 


. TEXT 


DMA 

@DMA, ARO 
@RESET, RO 
RO, *ARO 
@SOURCE, RO 
RO, *+ARO (4) 
@DESTIN, RO 
RO, *+ARO (6) 
@COUNT, RO 
RO, *+ARO (8) 
400H, IE 
2000H, ST 
@CONTROL, RO 
$ 


DMA 


DMA GLOBAL 
DMA GLOBAL 
DMA GLOBAL 


DATA ARRAY 


DATA SOURCE 
DATA DESTINATION ADDRESS 
NUMBER OF WORDS 
ARRAY INITIALIZATION VALUE 


CONTROL REG ADDRESS 
CONTROL REG RESET VALUE 
CONTROL REG INITIALIZATION 


ADDRESS 


O TRANSFER 


0.0 = 0X80000000 
LOCATED IN .BSS SECTION 


LOAD DATA PAGE 


POINTER 


POINT TO DMA GLOBAL CONTROL REGISTER 


RESE DMA 


INITIALIZE 


INITIALIZE 


DMA SOURCE ADDRESS REGISTER 


DMA DESTINATION ADDRESS REGISTER 


INITIALIZE 


DMA TRANSFER COUNTER REGISTER 


ENABLE INTE 


ENABLE CPU 
INITIALIZE 


RRUPT FROM DATA TO CPU 
INTERRUPTS GLOBALLY 
DMA GLOBAL CONTROL REGISTER 
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In Example 7-1, the DMA initializes a 128-element array to 0. The DMA sends 
an interrupt to the CPU after the transfer is completed. This program assumes 
previous initialization of the CPU interrupt vector table (specifically the DMA-to- 
CPU interrupt). The ST and IE registers are initialized for interrupt processing. 


In Example 7—2, the serial port 0 is initialized to receive 32-bit data words with 
an internally generated receive-bit clock and a bit-transfer rate of 
8H1 cycles/bit. 


This program assumes previous initialization of the CPU interrupt vector table 
(specifically the DMA-to-CPU interrupt). The serial-port interrupt directly affects 
only the DMA; therefore, no CPU serial-port interrupt vector setting is required. 
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Example 7-2. DMA Transfer With Serial-Port Receive Interrupt 


* TITLE DMA TRANSFER WITH SERIAL PORT RECEIVE INTERRUPT 


-GLOBAL START 


.DATA 
DMA .WORD 808000H ; DMA GLOBAL CONTROL REG ADDRESS 
CONTROL .WORD 0D43H ; DMA GLOBAL CONTROL REG INITIALIZATION 
SOURCE .WORD 80804CH ; DATA SOURCE ADDRESS: SERIAL PORT INPUT REG 
DESTIN .WORD _ARRAY ; DATA DESTINATION ADDRESS 
COUNT .WORD 128 ; NUMBER OF WORDS TO TRANSFER 
IEVAL .WORD 002000400H ; IE REGISTER VALUE 
RESET1 .WORD OD40H ; DMA RESET 
.BSS  _ARRAY,128 ; DATA ARRAY LOCATED IN .BSS SECTION 
; THE UNDERSCORE USED IS JUST TO MAKE IT 
; ACCESSIBLE FROM C (OPTIONAL) 
START LDP DMA ; LOAD DATA PAGE POINTER 
* DMA INITIALIZATION 
LDI @DMA, ARO ; POINT TO DMA GLOBAL CONTROL REGISTER 
LDI @SPORT,AR1 
LDI @RESET, RO 
STI RO, *+AR1 (4) ; RESET SPORT TIMER 
LDI @RESET1, RO 
STI RO, *ARO ; RESET DMA 
LDI @SPRESET, RO 
STI RO, *AR1 ; RESET SPORT 
LDI @SOURCE, RO ; INITIALIZE DMA SOURCE ADDRESS REGISTER 
STI RO, *+ARO (4) 
LDI @DESTIN, RO ; INITIALIZE DMA DESTINATION ADDRESS REGISTER 
STI RO, *+ARO (6) 
LDI @COUNT, RO ; INITIALIZE DMA TRANSFER COUNTER REGISTER 
STI RO, *+ARO (8) 
OR @IEVAL, IE ; ENABLE INTERRUPTS 
OR 2000H, ST ; ENABLE CPU INTERRUPTS GLOBALLY 
LDI @CONTROL, RO ; INITIALIZE DMA GLOBAL CONTROL REGISTER 
STI RO, *ARO ; START DMA TRANSFER 
* SERIAL PORT INITIALIZATION 
LDI @SRCTRL, RO ; SERIAL-PORT RECEIVE CONTROL REG INITIALIZATION 
STI RO, *+AR1 (3) 
LDI @STPERIOD, RO ; SERIAL-PORT TIMER PERIOD INITIALIZATION 
STI RO, *+AR1 (6) 
LDI @STCTRL, RO ; SERIAL-PORT TIMER CONTROL REG INITIALIZATION 
STI RO, *+AR1 (4) 
LDI @SGCCTRL, RO ; SERIAL-PORT GLOBAL CONTROL REG INITIALIZATION 
STI RO, *AR1 
BU $ 
END 
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Example 7-3 sets up the DMA to transfer data (128 words) from an array buff- 
er to the serial-port-0 output register with serial-port transmit interrupt XINTO. 
The DMA sends an interrupt to the CPU when the data transfer completes. 


Serial port 0 is initialized to transmit 32-bit data words with an internally generated 
frame sync and a bit-transfer rate of 8H1 cycles/bit. The receive-bit clock is inter- 
nally generated and equal in frequency to one half of the ‘C3x H1 frequency. 


This program assumes previous initialization of the CPU interrupt vector table 
(specifically the DMA-to-CPU interrupt). The serial-port interrupt directly affects 
only the DMA; therefore, no CPU serial-port interrupt vector setting is required. 


ae | 


Note: Serial Port Transmit Synchronization 


The DMA uses serial port transmit interrupt XINTO to synchronize transfers. 
Because the XINTO is generated when the transmit buffer has written the last 
bit of data to the shifter, an initial CPU write to the serial port is required to 
trigger XINTO to enable the first DMA transfer. 


eee sss) 


Example 7-3. DMA Transfer With Serial-Port Transmit Interrupt 


* TITLE: DMA TRANSFER WITH SERIAL PORT TRANSMIT INTERRUPT 
am -GLOBAL START 

-DATA 
DMA -WORD 808000H ; DMA GLOBAL CONTROL REG ADDRESS 
CONTROL -WORD OEF13H ; DMA GLOBAL CONTROL REG INITIALIZATION 
SOURCE -WORD (_ARRAY+1) ; DATA SOURCE ADDRESS 
DESTIN -WORD 80804CH ; DATA DESTIN ADDRESS: SERIAL-PORT OUTPUT REG 
COUNT »-WORD 127 ; NUMBER OF WORDS TO TRANSFER =(MSG LENGHT-1) 
IEVAL -WORD 00100400H ; IE REGISTER VALUE 

-BSS _ARRAY, 128 ; DATA ARRAY LOCATED IN .BSS SECTION 

; THE UNDERSCORE USED IS JUST TO MAKE IT 
; ACCESSIBLE FROM C (OPTIONAL) 

RESET1 WORD OE10H ; DMA RESE 
SPORT WORD 808040H ; SERIAL-—PORT GLOBAL CONTROL REG ADDRESS 
SGCCTRL WORD 04880044H ; SERIAL-—PORT GLOBAL CONTROL REG INITIALIZATION 
SXCTRL WORD 111H ; SERIAL-PORT TX PORT CONTROL REG INITIALIZA- 
TION 
STCTRL WORD OOFH ; SERIAL-PORT TIMER CONTROL REG INITIALIZATION 
STPERIOD WORD 00000002H ; SERIAL-—POR IMER PERIOD 
SPRESET -WORD 00880044H ; SERIAL-PORT RESET 
RESE -WORD OH ; SERIAL-PORT TIMER RESET 

. TEXT 
START LDP DMA ; LOAD DATA PAGE POINTER 
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Example 7-3. DMA Transfer With Serial-Port Transmit Interrupt (Continued) 


* DMA INITIALIZATION 


LDI @DMA, ARO ; POINT TO DMA GLOBAL CONTROL REGISTER 
LDI @SPORT,AR1 
LDI @RESET, RO 
STI RO, *+AR1 (4) ; RESET SPORT TIMER 
STI RO, *ARO ; RESET DMA 
STI RO, *AR1 ; RESET SPORT 
LDI @SOURCE, RO ; INITIALIZE DMA SOURCE ADDRESS REGISTER 
STI RO, *+ARO (4) 
LDI @DESTIN, RO ; INITIALIZE DMA DESTINATION ADDRESS REGISTER 
STI RO, *+ARO (6) 
LDI @COUNT, RO ; INITIALIZE DMA TRANSFER COUNTER REGISTER 
STI RO, *+ARO (8) 
OR @IEVAL, IE ; ENABLE INTERRUPT FROM DMA TO CPU 
OR 2000H, ST ; ENABLE CPU INTERRUPTS GLOBALLY 
LDI @CONTROL, RO ; INITIALIZE DMA GLOBAL CONTROL REGISTER 
STI RO, *ARO ; START DMA TRANSFER 
* SERIAL PORT INITIALIZATION 
LDI @SXCTRL, RO ; SERIAL-PORT TX CONTROL REG INITIALIZATION 
STI RO, *+AR1 (2) 
LDI @STPERIOD, RO ; SERIAL-PORT TIMER PERIOD INITIALIZATION 
STI RO, *+AR1 (6) 
LDI @STCTRL, RO ; SERIAL-PORT TIMER CONTROL REG INITIALIZATION 
STI RO, *+AR1 (4) 
LDI @SGCCTRL, RO ; SERIAL-PORT GLOBAL CONTROL REG INITIALIZATION 
STI RO, *AR1 
* CPU WRITES THE FIRST WORD (TRIGGERING EVENT ---> XINT IS GENERATED) 
LDI @SOURCE, ARO 
LDI *-ARO(1),RO 
STI RO, *+AR1 (8) 
BU $ 
. END 


Other examples of DMA initialization include: 


(1 Transfer a 256-word block of data from off-chip memory to on-chip 
memory and generate an interrupt on completion. Maintain the memory 


order. 

DMA source address 800000h 
DMA destination address 809800h 
DMA transfer counter 00000100h 
DMA global control 00000C53h 
CPU/DMA interrupt enable (IE) 00000400h 
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Transfer a 128-word block of data from on-chip memory to off-chip 
memory and generate an interrupt on completion. Invert the order of 
memory—the highest addressed member of the block becomes the low- 
est addressed member. 


DMA source address 809800h 
DMA destination address 800000h 
DMA transfer counter 00000080h 
DMA global control 00000C93h 
CPU/DMA interrupt enable (IE) 00000400h 


Transfer a 200-word block of data from the serial port 0 receive register 
to on-chip memory and generate an interrupt on completion. Synchronize 


the transfer with the serial-port-0 receive interrupt. 


DMA source address 80804Ch 
DMA destination address 809C00h 
DMA transfer counter 000000C8h 
DMA global control 00000D43h 
CPU/DMA interrupt enable (IE) 00200400h 


Transfer a 200-word block of data from off-chip memory to the serial port 
0 transmit register and generate an interrupt on completion. Synchronize 
the transfer with the serial-port-0 transmit interrupt. 


DMA source address 809C00h 
DMA destination address 808048h 
DMA transfer counter 000000C8h 
DMA global control 00000E13h 
CPU/DMA interrupt enable (IE) 00400400h 


Transfer data continuously between the serial port 0 receive register and 
the serial-port-0 transmit register to create a digital loop back. Synchro- 
nize the transfer with the serial-port-0 receive and transmit interrupts. 


DMA source address 80804Ch 
DMA destination address 808048h 
DMA transfer counter 00000000h 
DMA global control 00000303h 
CPU/DMA interrupt enable (IE) 00300000h 


Programming the DMA Channel 7-9 


7-10 


Chapter 8 


Analog Interface Peripherals and Applications 


Analog interface peripherals are analog input/output devices that interface di- 
rectly to the ’C3x. This chapter describes these devices and their applications 
in ’C8x-based systems. 
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8.1 Analog-to-Digital Converter Interface to the TMS320C30 Expansion Bus 
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Analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) 
are commonly required in DSP systems and interface efficiently to the I/O 
expansion bus. These devices are available in many speed ranges and with 
a variety of features. While some might require one or more wait states on the 
I/O bus, others can be used at full speed. Figure 8-1 illustrates a’C30 interface 
to an Analog Device’s AD1678 ADC. The AD1678 is a 12-bit, 5-us converter 
that allows sample rates up to 200 kHz and has an input voltage range of 10 V, 
bipolar or unipolar. The converter is connected according to manufacturer’s 
specifications to provide 0-10-V operation. This interface illustrates a com- 
mon approach to connecting such devices to the ’C30. Note that the interface 
requires only a minimum amount of control logic. 


The AD1678 is a very flexible converter and is configurable in a number of dif- 
ferent operating modes. These operating modes include: 


_j Byte or word data format 

[J Continuous or noncontinuous conversions 
[41 Enabled or disabled chip-select function 

(J) Programmable end-of-conversion indication 


This interface uses a data format of 12-bit words, rather than a byte format, to 
be compatible with the ’C3x. Noncontinuous conversions are selected so that 
variable sample rates can be used; continuous conversions occur at a fixed 
rate of 200 kHz. With noncontinuous conversions, the host processor deter- 
mines the conversion rate by initiating conversions through write operations 
to the converter. 


The chip-select input must be active when accessing the device. Enabling the 
chip-select function is necessary to isolate the AD1678 from other peripheral 
devices connected to the expansion bus. To establish the desired operating 
modes, the SYNC and 12/8 inputs to the converter are pulled high and EOCEN 
is grounded, as specified in the AD1678 Data Sheet. 


In this application, the converter’s chip-select is driven by XA12, which maps 
this device at 804000h in I/O address space. Conversions are initiated by writ- 
ing any data value to the device. The conversion results are obtained by read- 
ing from the device after the conversion is complete. To generate the device’s 
start conversion (SC) and output enable (OE) inputs, the 74AS32 performs an 
AND operation on IOSTRB and R/W (see Figure 8-1). Therefore, the conver- 
ter is selected whenever XA12 is low; OE is driven when reads are performed, 
and SC is driven when writes are performed. 
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Figure 8-1. Interface Between the TMS320C30 and the AD1678 
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As with many A/D converters, the AD1678 data output lines enter a high- 
impedance state at the end of aread cycle. This occurs after the output enable 
(OE) or read control line goes inactive. Furthermore, the data output buffer of- 
ten requires a substantial amount of time to actually attain a full high-impe- 
dance state. When used with the ’'C30-33, device output must be fully disabled 
no later than 65 ns following the rising edge of IOSTRB. This is because the 
C30 begins driving the data bus at this point if the next cycle is a write. If this 
timing is not met, bus conflicts between the C30 and the AD1678 can occur. 
This degrades system performance and may cause failure due to damaged 
data bus drivers. The actual disable time for the AD1678 can be as long as 
80 ns; therefore, 74LS244 buffers are used to isolate the converter outputs 
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from the ‘C30. The buffers are enabled when the AD1678 is read and are 
turned off 30.8 ns after IOSTRB goes high, meeting the ’C30-33 requirement 
of 65 ns. 


When data is read following a conversion, the AD1678 takes 100 ns after its 
OE control line is asserted to provide valid data at its outputs. Thus, including 
the propagation delay of the 74LS244 buffers, the total access time for reading 
the converter is 118 ns. This requires two wait states on the ’C30-33 expansion 
I/O bus. 


The two wait states required in this case are implemented using software wait 
states. However, depending on the overall system configuration, you can im- 
plement a separate wait-state generator for the expansion bus (for example, 
in a case where multiple devices that require different numbers of wait states 
are connected to the expansion bus). See section 4.5 Wait States and Ready 
Generation on page 4-10. 


Figure 8-2 shows the timing for read operations between the ’C30-33 and the 
AD1678. At the beginning of the cycle, the address and XR/W lines become 
valid at 10 ns (t1) following the falling edge of Hj. Then, after 10 ns (to) from 
the next rising edge of Hj, IOSTRB goes low. This begins the active portion 
of the read cycle. After the control logic propagation delay at 5.8 ns (tg), the 
IOR signal goes low, asserting the OE input to the AD1678. The 74LS244 buff- 
ers take 30 ns (tg) to enable their outputs. Then, after the converter access 
delay and the buffer propagation delay at 118 ns (ts which equals 100 + 18), 
data is provided to the ’C30. This provides approximately 46 ns of data setup 
time before the rising edge of IOSTRB. Therefore, this design easily satisfies 
the ‘C30-33’s requirement of 15 ns of data setup time for reads. 


Figure 8-2. Read Operations Timing Between the TMS320C30 and the AD1678 
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Analog-to-Digital Converter Interface to the TMS320C30 Expansion Bus 


Unlike the primary bus, read and write cycles on the I/O expansion bus are 
timed the same but have the following exceptions: 


1) XR is high for reads and low for writes 
1 The data bus is driven by the ’C30 during writes (reads are the same) 


When writing to the AD1678, the 74LS244 buffers do not turn on and no data 
is transferred. The purpose of writing to the converter is only to generate a 
pulse on the converter’s SC input, which initiates a conversion cycle. When a 
conversion cycle is completed, the AD1678’s end of conversion (EOC) output 
generates an interrupt on the ’C30 to indicate that the converted data can be 
read. 


The TLC1225 is a self-calibrating 12-bit-plus-sign bipolar or unipolar conver- 
ter, which features 10-u1s conversion times. The TLC1550 is a 10-bit, 6-t1s con- 
verter with a high-speed DSP interface. Both converters are parallel-interface 
devices. 
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8.2 Digital-to-Analog Converter Interface to the TMS320C30 Expansion Bus 
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In many DSP systems, the requirement for generating an analog output signal 
is a consequence of sampling an analog waveform with an ADC so that it can 
be processed digitally. This digitally processed signal is then reproduced with 
a digital-to-analog converter (DAC). Interfacing the DAC to the ’C30 on the 
expansion I/O bus is also straightforward. 


Various types of DACs may be distinguished by whether or not the converters 
include: 


[1 Latches to store the digital value to be converted to an analog quantity 
LJ The interface to control those latches 


When latches and control logic are included, interface design is often simpli- 
fied; however, internal latches are often included only in slower DACs. 


Although slower converters limit signal bandwidth, the converter design 
described in Figure 8-3 allows a reasonably wide range of signal frequencies 
to be processed and illustrates the technique of interfacing to a converter that 
uses external data latches. 


Figure 8-3 shows an interface to an Analog Device, AD565A DAC. This 
device is a 12-bit, 250-ns current output DAC with an on-chip 10-V reference. 
Using an off-chip current-to-voltage conversion circuit connected according to 
the manufacturer’s specifications, the converter exhibits output signal ranges 
of 0-10 V, which is compatible with the conversion range of the ADC discussed 
in the previous section. 


Because this DAC essentially performs continuous conversions based on the 
digital value provided at its inputs, periodic sampling is maintained by updating 
the value stored in the external latches at regular intervals. Therefore, 
between updates, the digital value is stored and maintained at the latch out- 
puts that provide the input to the DAC. This results in a stable analog output 
until the next sample update is performed. 
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Figure 8-3. Interface Between the TMS320C30 and the AD565A 
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The external data latches are 74LS377 devices that have both clock and 
enable inputs. These latches serve as a convenient interface with the C30; the 
enable inputs provide a device select function and the clock inputs latch the 
data. The enable input driven by inverted XA12 and the clock input driven by 
IOW (which is the AND of IOSTRB and XR/W). Therefore, data is stored in the 
latches when a write is performed to I/O address 805000h. Reading this 
address has no effect on the circuit. 


Figure 8—4 shows the timing diagram of a write operation to the DAC latches. 
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Figure 8-4. Timing Diagram for Write Operation to the DAC 
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Because the data is written to the latches, rather than to the DAC, the timing 
requirements for these devices are fundamental to the operation of the inter- 
face. At a minimum, these latches require: 


_j Data setup time of 20 ns 

(Jj Enable setup time of 25 ns 

(41 Disable setup time of 10 ns 

(j Data and enable hold times of 5 ns 


This design provides approximately 60 ns of enable setup, 30 ns of data setup, 
and 7.2 ns of data hold time. Therefore, the setup and hold times provided by 
this design exceed those required by the latches. The key timing parameters 
for this interface are summarized in Table 8-1. 
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Table 8-1. Key Timing Parameters for DAC Write Operation 


Time 
Interval 


ty 
to 
tg 
t4 
ts 
te 
t Timing for the ’'C30-33 


Event 


H1 falling to address valid 
XA12 to XA12 delay 

H1 rising to IOSTRB falling 
IOSTRB to IOW delay 
Data setup to low 


Data hold from lIOW 


Time 
Periodt 


10 ns 
5ns 

10 ns 
5.8 ns 
30 ns 


7.2nS 
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8.3 Burr-Brown DSP101/2 and DSP201/2 Interface to TMS320C3x 


Figure 8-5 shows how to interface the ’C3x with zero glue logic to Burr- 
Brown’s DSP201/2 and DSP101/2 family of 16-bit DAC and ADC. Using a’C3x 
and the DSP202 and DSP 102 dual-channel DAC and ADC chips provides an 
efficient, low-cost, stereo, digital audio interface. 


Figure 8-5. TMS320C31 Zero Glue-Logic Interface to Burr-Brown ADC and DAC 
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The DSP102 ADC is interfaced to the ’C3x serial port receive side; the DSP202 
DAC is interfaced to the transmit side. The ADC and DAC are hard-wired to 
runin cascade mode. In this mode, when the ’C8x initiates a convert command 
(CONV) to the ADC through its TCLKO pin, both analog inputs are converted 
into two 16-bit words that are concatenated to form one 32-bit word. The ADC 
signals the ’C3x that serial data from the last conversion is being transmitted 
through the ADC’s SYNC signal. The 32-bit word is then serially transmitted, 
most significant bit (MSB) first, through the SOUTA serial pin of the DSP102 
to the DRO pin of the ’C3x serial port. The ’C3x is programmed to drive the ana- 
log interface bit clock from its CLKX0O pin. The bit clock drives both the ADC 
and DAC XCLK input. 


The ’C8x transmit clock can also act as the input clock on the receive side of 
the ’C3x serial port. Since the receive clock is synchronous to the 'C3x’s inter- 
nal clock, the receive clock can run at full speed (even though it is an external 
clock). 
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Similarly, upon receiving a convert command (CONV), the DAC converts the 
last word received from the ’C3x. It signals the ’C3x, through the SYNC signal, 
to begin transmitting a 32-bit word representing the two channels of data to be 
converted. The data, transmitted from the ’C3x DX0 pin, is input to both the 
SINA and SINB inputs of the DAC. 


The ’C8x is set up to transfer bits at the maximum rate of about 8 Mbytes/s. 
It uses a dual-channel sample rate of about 44.1 KHz by setting the following 
registers (assuming a 32 MHz CLKIN): 


Serial Port: 

Port global control register 0x0EBC0040 
FSX/DX/CLKX port control register 0x00000111 
FSR/DR/CLKR port control register 0x00000111 
Receive/transmit timer control register Ox0000000F 
Timer: 

Timer global control register 0x000002C1 
Timer period register 0x000000B5 


A synchronous receive interrupt service routine is sufficient for parsing and 
transferring data between the serial ports and memory. Source code for setting 
up the serial port and timers of the ’C8x for interfacing to the DSP102 and 
DSP202 can be found on the TI BBS (file name: C3XBB.EXE). This code is 
listed in Example 8-1 through Example 8-4. 
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Example 8-1. TMS320C3x / BB — DSP 102/202 Driver Header File 


[BRK KK KK KK KK A A A A A A A I A A A A I I / 


/* BB.H * / 
/* * / 
/* TMS320C3x -— BB DSP102/202 DRIVER HEADER FILE */ 


[RRR KR RK KK KK A A A A KK / 


#include <serprt30.h> 

#include <timer30.h> 

#include <dma30.h> 

#include <bus30.h> 

#include <general.h> 

[KR KR RK KR KK A A A OO OK OK / 


/* COMMON STRUCTURES wid 


[KKK KK KK KK KK A A A A A A A A A A A I I I / 


typedef volatile int VI; 

typedef volatile float VF; 
typedef VF * volatile VPVF; 
typedef VI * volatile VPVI; 


[RRR KK RK KR KK KR A A A A A A A I  / 


/* FUNCTION PROTOTYPES */ 
[BORK RK KK KK KA A A A A A A A A A A A A A A I I OK / 
void c_int99 (void); 

void heap_overflow(void) ; 

void init_c30 (void) ; 

void error_in_real_time (void); 

[BRK KR KK KR KK A A A A OK / 


/* MACROS */ 
[RRR KR RK KK KK KR OK A A A A A A A A A A RK I  / 
define BLOCK_SIZE 64 /* BUFFER SIZE */ 

define GEN_OSC OFF /* GENERATE OSCILLATOR */ 

define GEN_CC ON /* GENERATE CONVERT COMMAND */ 
define SER_NUM SERIAL PORT_ONE 

#define OSC_TIMER_NUM TIMER_ZERO 

#define CC_TIMER_NUM IMER_ONE 

define XF_NUM 1 

#define ERROR_CHECK ON 

define WAIT_BUFFERS while(!buffer_rcevd || !buffer_xmtd); 

#define RESET_FLAGS buffer_rcvd = buffer_xmtd = FALSE 

#define INIT_ARRAYS init_arrays(t_buffer, r_buffer) 

if XF_NUM 

define RESET_BB asm(” AND 2Fh,IOF”); asm(” OR 20h, IOF”) 

#define UN_RESET_BB asm(” OR 60h, IOF”) 

else 

define RESET_BB asm(” AND OF2h,IOF”); asm(” OR 2h,IOF”) 

#define UN_RESET_BB asm(” OR’ 6h, IOF”) 

endif 


8-12 


Burr-Brown DSP101/2 and DSP201/2 Interface to TMS320C3x 


Example 81. TMS320C3x / BB — DSP 102/202 Driver Header File (Continued) 


/* TIMER PERIOD VALUES ARE BASED ON AN INPUT CLOCK OF 30 MHz */ 
#define CD OxAA 

#define DAT Ox9C 

#define TIMER_PERIOD CD 

#define WAIT (A) for (i=0;i<A; i++); 


[KK KK KK IK OK IK I A I A A A I A A I A I / 


/* STRUCTURES */ 
[OCC IO I ICI IO ICICI II IG III ICI IOI III ICI IO III ICA IO I I ICE II II I ICR IOI I I A IA IO a a 7/ 


typedef union 
{ 


unsigned int _intval; 


struct { 
signed int chanO 2:16; 
signed int chanl S36; 
} _bitval; 


} BB_CASC_WORD; 
[ORCI ICI III IOI III II ICI III ICI III III II III I II III IR IAI ISR IA I AO Ok // 


/* GLOBAL VARIABLES */ 
[KK KK HK KK I I I A I A A A A I A I OO  / 
extern int t_buffer; /* OUTPUT BUFFER SIZE */ 
extern int r_buffer; /* INPUT BUFFER SIZE */ 
extern VPVF output0; /* OUTPUT DATA BUFFER FOR PROCESSOR */f 
extern VPVF input0; /* INPUT DATA BUFFER FOR PROCESSOR a] 
extern VPVF output_xfer0; /* OUTPUT DATA BUFFER FOR ISR/BB */ 
extern VPVF input_xfer0; /* INPUT DATA BUFFER FOR ISR/BB */ 
extern VPVF outputl1; /* OUTPUT DATA BUFFER FOR PROCESSOR if 
extern VPVF input1l; /* INPUT DATA BUFFER FOR PROCESSOR */ 
extern VPVF output_xferl; /* OUTPUT DATA BUFFER FOR ISR/BB */ 
extern VPVF input_xferl; /* INPUT DATA BUFFER FOR ISR/BB */ 
extern VI buffer_rcevd; /* CPU-ISR COMM FLAG (INPUT) */ 
extern VI buffer_xmtd; /* CPU-ISR COMM FLAG (OUTPUT) */ 
extern VI r_index; /* INDEX INTO INPUT AND OUTPUT DATA ARRAYS */ 
extern VI t_index; /* INDEX INTO INPUT AND OUTPUT DATA ARRAYS */ 
extern VI i; /* GENERIC COUNTER VARIABLE */ 
[KOK KK HK KI OK I KI I A A A I A I A A I A  / 
/* FUNCTION PROTOTYPES */ 


[8K KK KK KH KK I I A I A I A I A A I A I I / 
[%ROK KOK KK OK OK OK OK OK OK OK / 


/* BB DRIVER FUNCTIONS */ 
[OOOO OOOO OOOO OK Kk / 


void init_arrays(int t_buffer_size, int r_buffer_size); 
void init_bb(int period_value) ; 

#if SER_NUM 

void c_int07 (void); 

#else 

void c_int05(void); 

#endif 
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Example 8-2. TMS320C3x — BB DSP102/202 Driver 


[OK KK HK HK IK I IK I I I I I I A A I A I A A A A A A A A A A I A I I He He / 


/*  BBDRVR.C i / 

[* */ 
/* TMS320C3x - BB DSP102/202 DRIVER 

card 

[KK KK KK KK I KK I I A A I A A A A A a I  / 

#include <math.h> 

#include <stdlib.h> 

#include <bb.h> 


[KOK KK KK KK IK IK I I A A A A A I / 


/* GLOABL VARS By. 
[%OK KK KK K I  e/ 
int t_buffer = BLOCK_SIZE; /* OUTPUT BUFFER SIZE */ 
int r_buffer = BLOCK_SIZE; /* INPUT BUFFER SIZE */ 
VPVF output0; /* OUTPUT DATA BUFFER FOR PROCESSOR */ 
VPVF input0; /* INPUT DATA BUFFER FOR PROCESSOR */ 
VPVF output_xfer0; /* OUTPUT DATA BUFFER FOR ISR/BB */ 
VPVF input_xfer0; /* INPUT DATA BUFFER FOR ISR/BB */ 
VPVF outputl; /* OUTPUT DATA BUFFER FOR PROCESSOR */ 
VPVF inputl; /* INPUT DATA BUFFER FOR PROCESSOR */ 
VPVF output_xferl; /* OUTPUT DATA BUFFER FOR ISR/BB a / 
VPVEF input_xferl; /* INPUT DATA BUFFER FOR ISR/BB */ 
VI buffer_rcvd = FALSE; /* CPU-ISR COMM FLAG (INPUT) */ 
VI buffer_xmtd = FALSE; /* CPU-ISR COMM FLAG (OUTPUT) */ 
VI r_index = 0; /* INDEX INTO INPUT AND OUTPUT DATA ARRAYS */ 
VI t_index = 0; /* INDEX INTO INPUT AND OUTPUT DATA ARRAYS */ 
VI is /* GENERIC COUNTER VARIABLE */ 
[KK OK KK KK KK IK I I I A A A I A I A A I I / 
/* FUNCTION DECLARATIONS */ 


[KK KK KK HK IK KK A A A A I A A A I A A A A I A A A A A A A A I I He a / 
[KOK KK HK KK I I A A A A I A A I A I a a I  e/ 


/* VOID C_INTO5() OR C_INTO7(): */ 
(* ISR FOR HANDLING DATA TRANSFER BETWEEN C3X SERIAL PORT */ 
/* ONE AND THE A/D,D/A. ASSUMES SYNCHRONOUS OPERATION. aay 


[KK KK KK KK IKK I I I I I A I A A A A I A I / 


#if£ SER_NUM 
void c_int05(void) {} 
void c_int07 (void) 
#else 
void c_int0O7(void) {} 
void c_int05 (void) 
#endif 
{ 
BB_CASC_WORD temp; 
VPVF swap; 
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Example 8&2. TMS320C3x — BB DSP102/202 Driver (Continued) 


/* DSP102/202 TRANSFER TWO SIXTEEN BIT WORDS REPRESENTING * 
/* BOTH CHANNELS IN ONE THIRTYTWO BIT WORD. EXTRACT INTO mf 
/* THE INPUT_XFER BUFFERS */ 
temp._intval = SERIAL _PORT_ADDR(SER_NUM) ->r_data; 
input_xfer0O[r_index] = temp._bitval.chan0; 
input_xferl[r_index] = temp._bitval.chanl; 


/* WRITE OUTPUT_XFER BUFFER VALUE BY CASCADING BOTH CHANNELS */ 
temp._bitval.chanO = output_xfer0[t_index]; 

temp._bitval.chanl = output_xferl[t_index]; 
SERIAL_PORT_ADDR(SER_NUM) ->x_data = temp._intval; 


Ge) 


/* CHECK IF BUFFERS ARE FULL */ 


if (++r_index == r_buffer) 
{ 
/* CHECK CPU SYNCHRONIZATION FLAG */ 
#i£ ERROR_CHECK 
/* if (buffer_rcvd == TRUE) error_in_real_time(); */ 
if (buffer _rcvd == TRUE) for(;;); 
#endif 
swap = input0; 
input0 = input_xfer0; 
input_xfer0 = swap; 
swap = inputl; 
inputl = input_xferl; 
input_xferl = swap; 
r_index = 0; 
buffer_rcvd = TRUE; 
} 
if (++t_index == t_buffer) 


{ 
/* CHECK CPU SYNCHRONIZATION FLAG */ 
#if ERROR_CHECK 


/* if (buffer_xmtd == TRUE) error_in_real_time(); */ 
if (buffer_xmtd == TRUE) for(;;); 
#endif 
swap = output0; 
output0 = output_xfer0; 
output_xfer0 = swap; 
swap = outputl; 
outputl = output_xferl; 
output_xferl = swap; 
t_index = 0; 


buffer_xmtd = TRUE; 
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Example 8-2. TMS320C3x — BB DSP 102/202 Driver (Continued) 


[OK KK KK HK KI OK I I I A A A A A A I A A A I A A A A A A A A I A I He He / 


/* INIT_ARRAYS(): INITIALIZE DATA ARRAY PARAMETERS */ 
[ROR ICICI IO III III IO ICICI II III III I III IOI III IR IRI IO IR I AO ISR IA dk a7 


int r_buffer) 


void init_arrays(int t_buffer, 
{ 

NE ise 

[OK KOK OK OK OK I I I A  / 


/* INITIALIZE AND ZERO FILL ARRAYS */ 
[FOR IO IO IORI IO III IO AI IO AO I Ok / 


if(! (inputo = (float *) calloc(r_buffer, sizeof (float) ))) 
heap_overflow(); 

if (! (outputo = (float *) calloc(t_buffer, sizeof (float) ))) 
heap_overflow(); 

if (! (input_xfer0d = (float *) calloc(r_buffer, sizeof (float) ))) 
heap_overflow(); 

if (! (output_xfer0 = (float *) calloc(t_buffer, sizeof (float) ))) 
heap_overflow(); 

if(! (inputl = (float *) calloc(r_buffer, sizeof (float) ))) 
heap_overflow(); 

if(! (outputl = (float *) calloc(t_buffer, sizeof (float) ))) 
heap_overflow(); 

if(! (input_xferl = (float *) calloc(r_buffer, sizeof (float) ))) 
heap_overflow(); 

if(! (output_xferl = (float *) calloc(t_buffer, sizeof (float) ))) 
heap_overflow(); 

for(i = 0; i < t_buffer; i++) 

{ 

outputO[i] = output_xfer0O[i] = 0.0; 
output1l[i] = output_xferl[i] Oa0s 


} 


[KK KK KK KK IK OK I A I A I A A I A A I A A A I I a ok / 


/* INIT_BB(): INITIALIZE COMMUNICATIONS TO DSP102/202 mf 
[OO II IO III II IO IC IOI II III IOI III II I IC II I II I IIR I IO A IK 


void ini 


{ 


t_bb (int period_value) 


#endif 


/* RESET D/A, MAKE SURE RESET IS HELD LOW SUFFICIENTLY (?) LONG */ 
RESET_BB; 
WAIT (50); 
#if GEN_OSC 
/* CONFIGURE C3X TIMER AS BB A/D OSC */ 
TIMER_ADDR(OSC_TIMER_NUM) ->gcontrol = 0x0; 
TIMER_ADDR(OSC_TIMER_NUM)->counter = 0x0; 
TIMER_ADDR (OSC_TIMER_NUM) ->period = 0x0; 
TIMER_ADDR (OSC_TIMER_NUM) ->gcontrol = FUNC | Go | HLD_ CP_ | CLKSRC; 
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Example 82. TMS320C3x — BB DSP102/202 Driver (Continued) 


/* CONFIGURE SERIAL PORT */ 

SERIAL_PORT_ADDR (SER_NUM) —>gcontrol = 0x0; 

SERIAL_PORT_ADDR (SER_NUM) ->s_x_control = CLKXFUNC | DXFUNC | FSXFUNC; 

SERIAL_PORT_ADDR (SER_NUM) ->s_r_control = CLKRFUNC | DRFUNC | FSRFUNC; 

SERIAL_PORT_ADDR(SER_NUM) ->s_rxt_control = Ox0F; 

SERIAL_PORT_ADDR(SER_NUM) ->s_rxt_period = 0x0; 

SERIAL_PORT_ADDR (SER_NUM) —>gcontrol = XCLKSRCE | XLEN_32 RLEN_32 | 
XINT | XRESET | RRESET; 

/* CLEAR SERIAL TRANSMIT DATA ava 

SERIAL_PORT_ADDR(SER_NUM) ->x_data = 0x0; 


/* TAKE A/D,D/A OUT OF RESET, (OPTIONALY) CLEAR THE INT FLAG REG, */ 
/* ENABLE THE APPROPRIATE SERIAL PORT TRANSMIT INT AND ENABLE */ 
/* GLOBAL INTERRUPTS */ 
UN_RESET_BB; 
CL_INT_FL_REG; 


#if SER_NUM 


N_SER_PORT_XMT_INT_1; 


x 


EN _SER_PORT_XMT_INT_0O; 
#endif 


EN_GLOBAL_INTS; 


#if GEN_CC 
/* CONFIGURE C3X TIMER 1 AS BB A/D,D/A CONVERT CLOCK */ 


TIMER_ADDR(CC_TIMER_NUM) ->gcontrol = 0x0; 
IMER_ADDR(CC_TIMER_NUM)->counter = 0x0; 
TIMER_ADDR(CC_TIMER_NUM) ->period = period_value; 
IMER_ADDR(CC_TIMER_NUM) ->gcontrol = FUNC | GO | HLD_ CLKSRC; 
#endif 


} 
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Example 8-3. General Macro Definitions 


[KK KK KK HK HK I IK I I I A I I A A A A A A A A A A A A A A A A A A A A I I / 


/* general.h v4.2 */ 
/* Copyright (c) 1991 Texas Instruments Incorporated * f 
[OK KK KK HK KK IK I I A I A I A I A I A I A A A IA A IA A I A A A A A A I / 
#ifndef _GENERAL 
#define _GENERAL 


[OK KK KK HK HK IK KK IK I A A I A A I A I A A A A A I A A I A A A I A I a I 


/* COMMON MACRO DEFINTIONS 

[KOK HK KK KK IK I KI I I I A A A A A A I  / 
#ifndef OFF 

define OFF 0x00 

endif 


ifndef ON 
define ON 0x01 
#endif 


ifndef FALSE 
#define FALSE 0x00 
#endif 


#ifndef TRUI 
#define TRU 
#endif 


1x3 a &,5 | 


0x01 


#ifndef CLEAR 
#define CLEAR 0x00 
#endif 


#ifndef SET 
#define SET 0x01 
#endif 
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Example 8—3.General Macro Definitions (Continued) 


[RRR RR KK KR KK KK KK OK KK OK KR KR KR OK KR KR OR KR KK KK KK OK KK / 


/* GENERAL C3x MACROS */ 
[BORK KK KK KK KK RA A A A A A A AA A AAA AA A RA RA A A A I I I / 
#ifndef INIT_XF_PINS 

#define INIT_XF_PINS asm(” LDI 00h, IOF”) 

#endif 


#ifndef CL_INT_FL_REG 
#define CL_INT_FL_REG asm(” LDI Oh, IF”) 
#endif 


#ifndef EN_GLOBAL _INTS 
#define EN_GLOBAL_INTS asm(” OR 2000h, ST”) 
#endif 


#ifndef EN_SER_PORT_XMT_INT_O 
#define EN_SER_PORT_XMT_INT_O asm(” OR 10h, IE”) 
#endif 


#ifndef EN_SER_PORT_RCV_INT_0O 
#define EN_SER_PORT_RCV_INT_0O asm(” OR 20h, IE”) 
#endif 


#ifndef EN_SER_PORT_XMT_INT_1 
#define EN_SER_PORT_XMT_INT_1 asm(” OR 40h, IE”) 
fendif 


#ifndef EN_SER_PORT_RCV_INT_1 
#define EN_SER_PORT_RCV_INT_1 asm(” OR 80h, IE”) 
#endif 


#ifndef ENABLE CACHE 
#define ENABLE CACHE asm(” OR 800h, ST”) 
#endif 


#endif /* #ifndef _GENERAL */ 
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Example 8-4. Common Driver Header File 


[KK KK KK HK KI KI I I I A A I A A IA A A A IA A A A IA A A I A I I He He / 


[® COMMDRVR.H */ 
is a) 
[% TMS320C3x -— COMMOM DRIVER HEADER FILE ay 


[KOK KK HK KI KK I I A A A A A A A A I / 


#include <c30_per.h> 
[% %K KK HK HK K K K I A  A A A A I A A A / 


/* COMMON STRUCTURES */ 
[KK KK KK KK KK I I A I A I A I A I A A A a  / 
typedef volatile int VI; 

typedef volatile float VF; 

typedef VF * volatile VPVF; 

typedef VI * volatile VPVI; 


[KOK KK KK KK I KI I A I A A A A A I / 


/* FUNCTION PROTOTYPES x 
[OK KK HK HK HK I IK I A A I A A A I A A I A I A A A I A I A I a I He He / 


void c_int99 (woid) > 

void heap_overflow (void) ; 

void init_c30 (void); 

void error_in_real_time (void); 
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8.4 TLC32040 Interface to the TMS320C3x 


Figure 8-6 shows how to interface the ‘C3x with zero glue logic to a Texas 
Instruments’ TLC32040 14-bit analog interface circuit (AIC). The following 
sections describe the steps required to initialize and set up the ’C3x timer and 
serial port, and to reset and program the TLC32040. 


Figure 8-6. TM320C3x-to-TLC32040 Interface 


’C3x TLC32040 
TCLKO MCLK 
XFO RESET 
DRO DR 
Out- -—> 
DX0O DX 
Out+ -—> 
FSX0 FSX 
In+ -~<— 
FSRO FSR 
In— -~<<4— 
CLKXO i SCLK 
CLKRO 


8.4.1 Resetting the Analog Interface Circuit 


The’C31’s XFO signal is connected to the RESET signal of the AIC. By toggling 
the RESET signal, the ’C31 can reset the AIC. This is achieved by executing 
the following instructions: 


rpts 40 ; Execute next instruction 40x 
ldi 2h, I0OF ; Pull AIC into reset 
ldi 6h, I0OF ; Pull AIC out of reset 
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8.4.2 
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Initializing the TMS320C31 Timer 


The ’C31’s timer (TCLKO) signal is connected to the AIC’s master clock 
(MCLK) signal. The MCLK signal drives all the key logic signals of the AIC, 
such as the shift clock, the switched-capacitor filter clocks, and the ADC and 
DAC timing signals. The timer pulses the TCLKO signal whenever the ’C31 tim- 
er counter register (which is memory mapped to 0x808024) counts up to the 
value in the timer period register (which is memory mapped to 0x808028). 
Then, the timer counter register resets to 0 and repeats. (For a detailed 
description of the ’C31 timer, see the TMS320C3x User’s Guide.) Because of 
differences between the maximum frequency of the ’C31’s timer and the maxi- 
mum and minimum frequencies of the AIC, observe the following constraints: 


(1 Minimum Timer Period Register Value. The C31 running at 50 MHz can 
generate a maximum timer frequency of 12.5 MHz (CLKIN/4), which is 
above the AIC’s tested master clock frequency maximum of 10 MHz. If you 
use frequencies beyond those listed in the TLC32040 Data Sheet, the re- 
sulting performance can be unpredictable. If the timer is run in pulse mode 
(control value is 0x2C1) the minimum period of 1 results in 12.5-MHz mas- 
ter pulse rate and a period of 2 results in 6.25 MHz. See the 7LC32040 
Data Sheet for more information. 


() Maximum Timer Period Register Value. The AIC’s minimum master 
clock frequency is 75 kHz. Taking into account the C31 maximum timer 
frequency of 12.5 MHz and the AIC’s minimum master clock frequency, 
the maximum value in the ’C31’s timer counter register must be 165 
(12.5 MHz/75 kHz = 166.7). The ’C31’s timer counts down to 0; therefore, 
you must subtract 1 from this number (166 — 1 = 165). The TLC32040 
specification describes a minimum clock frequency, since the internal sig- 
nals of the AIC are stored in capacitors that must be periodically updated. 


The following C31 assembly code initializes the timer in clock mode with a tim- 
er period of 1. The following code initializes timer 0 to generate a square wave 
(clock mode) on the TCLKO pin at a frequency of 6.25 MHz (timer period = 1): 


TGCRO .set 808020h 
TCNTO -set 808024h 


Timer 0 global control register 
Timer 0 counter register 


, 
r 
TPRO .set 808028h ; Timer 0 period register 
TIMVAL .word 3clh ; Timer global control register value 
ldp @TGCRO ; Set Data Page 
ldi Oh, R4 ; Initialize R4 to zero 
ldi 1h, RO ; Initialize RO to l 
sti R4, @TGCRO ; Reset timer0O 
sti RO, @TPRO ; Store timer0O period 
sti R4, @TCNTO ; Reset timerO counter 
ldi @TIMVAL,R7 ; Load timer control value 
sti R7,@TGCRO ; Start timer 0 
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A period of 0 is not allowed in pulse mode. If the timer is run in clock mode, the 
resulting output is a square wave with a frequency of half that of pulse mode. 
A period of 0 is allowed in clock mode resulting in a 12.5-MHz clock. 


Initializing the TMS320C31 Serial Port 


This section explains how to initialize the: 


[1 ’C31 serial port 

1 ’C31 serial-port control register (memory mapped to 0x808040) 
[J FSX/DX/CLKX control register (memory mapped to 0x808042) 
1 FSR/DR/CLKR control register (memory mapped to 0x808043) 


For a detailed description of the ’C31 serial port, see the TMS320C3x User's 
Guide. 


Example 8-5 shows the assembly code to initialize the serial port global con- 
trol register (SGCRO) for the C31 in the following manner: 


—_h 


Issue transmit and receive resets 


) 
2) Enable receive and transmit interrupts 
3) Set 16-bit receive and transmit transfers 
4) Set FSX and FSR, CLKX and CLKR active low 
5) Set continuous mode 
6) Set variable data rate transfers 


See the example code supplied with the DSP for help on setting up the AIC. 


Example 8-5. Initialize the Serial Port Global Control Register 


SGCRO 
SPCXO 
SPCRO 
SINIT 
SINIT 


=) 


-set 
-set 
Set 
.word 
.word 
ldp 
ldi 
sti 
ldi 
sti 
sti 
ldi 
sti 


808040h ; Serial port 0 global control register j; 
808042h ; Serial port 0 FSX/DX/CLKX control reg. ; 
808043h ; Serial port 0 FSR/DR/CLKR control reg. ; 
0e€973300h ; Enable RINT & 16-bit transfers 

111h ; Configure as serial port pins 

@SGCRO ; Set Data Page 

Oh, R4 ; Initialize R4 to zero 

R4,@SGCRO 

@SINIT1,R7 ; Reset and 

R7,@SPCXO ; initialize serial port 

R7,@SPCRO A initialize serial port 

@SINITO,R7 ; Reset and 

R7,@SGCRO ; initialize serial port 


Analog Interface Peripherals and Applications 8-23 


TLC32040 Interface to the TMS320C3x 


8.4.4 


Initializing the AIC 


Once the C31 supplies MCLK, initializes its serial port, and resets the AIC, you 
can initialize the AIC to a specified sample rate. The AIC sampling rate is deter- 
mined by the values of two registers (Tx counter A and Tx counter B) in the 
AIC’s transmit and receive sections. These values are loaded into the respec- 
tive counter whenever the counter counts down to 0. The Tx counters A and 
B determine the D/A conversion timing. The Rx counters A and B determine 
the A/D conversion timing. For more information, see the TLC32040 AIC Data 
Sheet. The formula for the conversion frequency is given in Equation 8-1. 


Equation 8-1. Conversion Frequency 


MCLK 


Conversion. frequency == —_—— 
2 XA XB 

To ensure that the switched-capacitor lowpass and bandpass filters meet their 
transfer function characteristics, the frequency of the clock inputs of the 
switched-capacitor filter must be 288 kHz. Otherwise, the upper and lower cut- 
off frequencies of the lowpass and bandpass are scaled accordingly. 
Equation 8—2 shows the switched-capacitor filter frequency. 


Equation 8-2. Switched Capacitor Filter Frequency 
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SCF _Clock _ frequency = eee 


For example, using this equation for an 8-kHz sampling rate with an MCLK of 
6.25 MHz results in a Tx counter A of 11 [A = MCLK / (2 x SCF)]. Using 
Equation 8-2, Tx counter B results in 36 [B = MCLK/ (2 x A x Conver- 
sion_Frequency)]. 


To initialize the AlC’s Tx counter A and B registers, you must send a primary 
communication followed by a secondary communication (as explained in the 
following sections). Primary communications load values into the D/A while 
secondary communications load A/D internal registers, such as the control 
register, Tx counters A and B, and Rx counters A and B. 
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8.4.4.1. Primary Communications 


Primary communications have a data value in the 14 MSBs (D15—D2) of data 
and a mode selection in the two least significant bits (LSBs) (D1—D0). This for- 
mat is shown in Figure 8-7. 


The AIC sends the data value to the DAC and enables one of the modes shown 
in Table 8—2, depending on the two LSBs. 


Figure 8—7. Primary Communication Data Format 
D15 Di4 D1i3 DI2 Dil D1i0 D9 D8 D7 D6 D5 D4 D3 D2 D1 DO 


DAC value Mode 
selection 


Table 8-2. Primary Communications Mode Selection 


LSBs Mode 


00 Tx counter A < TA, Rx counter A<— RA 
Tx counter B < TB, Rx counter B RB 


01 Tx counter Ax TA + TA, Rx counter A <— RA + RA’ 
Tx counter B <-TB, Rx counter B — RB 


10 Tx counter A <-TA - TA’, Rx counter A — RA + RA’ 
Tx counter B <-TB, Rx counter B <— RB 


11 Tx counter A < TA, Rx counter A <= RA 
Tx counter B <-TB, Rx counter B < RB 


The second and third modes use the TA’ and RA registers to advance or slow 
down the sampling frequency by respectively shortening or lengthening the 
sample period. This is particularly useful in modem applications, where it can 
enhance the signal-to-noise performance, perform frequency-tracking func- 
tions, and generate nonstandard modem frequencies. 


8.4.4.2 Secondary Communications 


Secondary communication follows a primary communication that has the two 
LSBs set to 11 together. This secondary communication programs the AIC by 
loading the A, A’, B, or control registers. Figure 8-8 shows the secondary com- 
munication data format. The TA, RA, TB, and RB values are unsigned. The TA’ 
and RA’ values are in signed 2s-complement format. The control register 
enables bandpass filters and asynchronous transmit/receive, enables and 
disables auxiliary inputs, and changes input gain. 
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Table 8-3 describes the control register bit fields. 


Figure 8-8. Secondary Communication Data 


D1i5 D14 D13 D12 Dii D1i0 D9 D8 


TB register value (unsigned) 


X 


TA register value (signed 2s 
X complement) 


Table 8-3. Control Register Bit Fields 


D7 D6 D5 


Input gain Transmit/receive 


0 0=1X for + 6-V analog input |0 = asynchronous 
0 1=2X for + 3-V analog input |1 = enables 


1 0=4X for + 1.5-V analog in- 
put 


1 1=1X for + 6-V analog input 


Format 


D7 D6 D5 D4 D3 
i ( 


X TA register value (unsigned) X RA register value (unsigned) | oO | 


RA’ register value (signed 2s 


complement) 


RB register value (unsigned) [a> ej 


Control register 


AUX IN pins 
0 = disables 


1 = enables 


Loopback 
function 


0 = disables 


1 = enables 


D2 OD 


1 DO 
1 0 
1 1 


Bandpass 
filter 


0 = deletes 


1 = inserts 


The assembly code in Example 8-6 sets the TA and TB registers of the AIC. 
This code transmits a 16-bit word to the AIC and then waits until the transmit 
interrupt is generated by the serial port. Four commands are transmitted start- 
ing with a O, then the TB and RB values, followed by the TA and RA values, 
and finally the control word. TA and RA values should be the last values trans- 
mitted, since they change the AIC sample rate. By transmitting these values 
last, the sample rate is not changed until the AIC receives the last program 
word. In this way, very high sample rates can be achieved. Each command 
transmits three 16-bit words: a primary communication, a secondary commu- 
nication, and a zero-data word. 
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TLC32040 Interface to the TMS320C3x 


; the TLC32040. 


LOOPAIC.ASM is an example program which shows how to initialize and use 


The analog output (DAC output) 


; (RAMPEN=1) or a loopback of the analog input 


is either a ramp signal 
(RAMPEN=0) . 


RAMPEN set 
TO_ctrl set 
TO_count set 
TO_prd set 


SO_xctrl .set 
SO_rctrl .set 
SO_xdata .set 
SO_rdata .set 


TA -set 
TB ~set 
RA Set 
RB set 
GIE set 


0x808020 
0x808024 
0x808028 
0x808040 
0x808042 
0x808043 
0x808048 
O0x80804C 
12 

15 

12 

15 
0x2000 


; Define constants used by program 


7 Set to l 


; TIMO gl control 

; TIMO count 

; TIMO prd 

7 SP 0 global control 
0 FSX/DX/CLKX port ctl 

; SP 0 FSR/DR/CLKR port ctl 
0 
0 


; SP 


; SP 
; SP 


; This bit in ST turns on interrupts 


_ REG 
B_REG 
C_REG 
SO_gctrl_val 


SO_xctrl_val 
SO_rcetrl_val 
RAMP 
ADC_last 


(TA<<9) + (RA<<2) +0 
(TB<<9) + (RB<<2) +2 
10000011b 
0x0E970300 


0x00000111 
0x00000111 
0 
0 


Data transmit 
Data receive 
; AIC timing register values 


; RAMP count value 
; Last received ADC value 


to generate ramp at AOUT 


A registers 

B registers 

control 

Serial port control register 
values 
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Example 8-6. Setting the TA and TB Registers (Continued) 


p RRR RAK RRR KK KKK RK KR KKK KKK KEK KR KEK KR KK RK KK EK KKK KE KK KEK KR 


; Begin main code loop here 
ERR EERRE EER EER RE EER EER EE ERR EER EE EER EERE EERE ER RR BERR 


main or GIE, ST ; Turn on INTS 
ldi 0x3,1E ; Enable XINT/RINT 
call INIT 
b main 7 Do it again! 
i 
DAC2 push ST ; DAC Interrupt service routine 
push R3 ; 
Pa ids RAMPEN ; If RAMPEN=1 assemble this code 
ldi @RAMP, R3 H 
addi 256,R3 ; Add a value to RAMP 
sti R3,@RAMP ; 
.else ; Else assemble this 
ldi @ADC_last,R3 : 
endif ; 
andn 3,R3 ; 
sti R3,@SO_xdata ; Output the new DAC value 
pop R3 i 
pop ST H 
reti ; 
;----- Sarees eae errs 
ADC2 push ST ; 
push R3 ; 
ldi @SO_rdata, R3 ; 
sti R3, @ADC_last ‘ 
pop R3 i 
pop ST i 
reti ; 


PRR RR RK RRR KK RK KEK KKK KK KKK RK KR KKK KR KK RK KKK KKK KEK KKEKKEE » 


; The startup stub is used during initialization only ; 


7 and can be safely overwritten by the stack or data j; 
p RRR EERE RAKE EERE ERE EEK ER EKER EERE EKER EERE RAKE EEK ER ERE KK » 


-entry ST_STUB ; Debugger starts here 
INIT ldp TO ctrl ; Use kernel data page and stack 
ldi 0,RO ; Halt TIMO & TIM1 
sti RO, @TO_ctrl H 
sti RO, @TO_count ; Set counts to 0 
Rs 1,R0 ; Set periods to 1 
sti RO, @TO_prd A 
ldi 0x2C1,R0 ; Restart both timers in pulse mode 
sti RO, @TO_ctrl 7 


ldi @SO_xctrl_val, RO; 


sti RO, @SO_xctrl ; transmit control 
Aga. @SO_rctrl_val,R0O; 

sti RO, @SO_rctrl ; receive control 
ldi 0,RO ; 

sti RO, @SO_xdata ; DXR data value 
ldi @SO_gctrl_val,RO; Setup serial port 
sti RO, @SO_gctrl ; global control 
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Example 8-6. Setting the TA and TB Registers (Continued) 


v ts 
; This section of code initializes the AIC H 
, i 
AIC_INIT LDI 0x10,1E ; Enable only XINT interrupt 
andn 0x34,1F ; 
ldi 0,RO ; 
sti RO, @SO_xdata : 
RPTS 0x040 i 
LDI 2, L0F ; XFO=0 resets AIC 
rpts 0x40 ; 
LDI 6, IOF ; XFO=1 runs AIC 
, 
ldi @C_REG, RO ; Setup control register 
call prog_AIC ; 
ldi Oxfffc ,RO ; Program the AIC to be real slow 
call prog_AIC ; 
ldi Oxfffc|2,RO0 ; 
call prog_AIC ; 
ldi @B_REG, RO ; Bump up the Fs to final rate 
call prog_AIC ; (smallest divisor should be last) 
ldi @A_REG, RO H 
call prog_AIC ; 
b main 
, 
prog_AIC ldi @SO_xdata,R1 ; Use original DXR data during 2 ndy 
sti R1,@SO_xdata ; 
idle 
ldi @SO_xdata,R1 ; Use original DXR data during 2 ndy 
or 3,R1 ; Request 2 ndy XMIT 
sti R1,@SO_xdata : 
idle A 
sti RO, @SO_xdata ; Send register valu 
idle ; 
andn 3,R1 ; 
sti R1,@SO_xdata ; Leave with original safe value in DXR 
’ 
ldi @SO_rdata, RO ; Fix the receiver underrun by reading 
rets main ; the DRR before going to the main loop 
SEKEKKARA KEK REE AKEK EEKER EEK RKKER EEK EKER KE KEK RAK KKK EH» 
; Install the XINT/RINT ISR handler directly into ; 
; the vector RAM location it will be used for p 
PEERS RELER EEE AE AER ERK AREER EER AE LER EAR AE LER ERASE ERA ES gy 
-start ”“SPOVECTS”, 0x809FC5 
-sect ”SPOVECTS” 
B DAC2 ; XINTO 
B ADC2 ; RINTO 
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8.5 TLC320AD58 Interface to the TMS320C3x 


The TLC320AD58C serial interface provides several master and slave modes 
for 16-bit or 18-bit data output. This allows it to be compatible to a wide range 
of DSPs. To interface with the ’C3x 32-bit floating-point DSP, the 18-bit master 
mode “100” was chosen to get an 18-bit resolution result and meet the ’C3x 
serial port requirements. The timing diagram is shown in Figure 8-9. 


Figure 8-9. TLC320AD58C Serial Interface 18-bit Master Mode “100” Timing Diagram 


¢ 


64 SCLKs | 


sckK/\/\W\ DV, IS LTS TDS. 
FSYNC__/ \ J \ 

Right channel MSB 
LRCLK Y Left channel MSB \ A 
pour___ 77 X6K_, KT XON CTXON, 7K 


<—___________—~ 32 SCLKs 
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T Cx serial port t 
receive interrupt *C3x serial port 


<—_______—~ 32 SCLKs —_______* 
receive interrupt 


The frame sync signal (FSYNC) is then used to designate valid data from the 
ADC and is active for one shift clock period. After the falling edge of FSYNC, 
the left channel data is shifted out on the falling edge of SCLK with the MSB 
(D17) first. When the last data bit is shifted out, the output remains low for 
another 14 SCLKs to get a total of 32 SCLK periods each channel. After 32 
SCLKs, LRCLK goes low and the right channel data is then shifted out. FSYNC 
and LRCLK frequency are fixed to the sampling frequency (Fs = MCLK/256 or 
MCLK/384, depending on the status of the CMODE input pin). The conversion 
cycle is synchronized to the rising edge of LRCLK and, therefore, to the falling 
edge of FSYNC. Although data is shifted out in two separate time packets rep- 
resenting the left and right channel digital outputs, the analog inputs are 
sampled and converted simultaneously. In the master mode, SCLK, FSYNC, 
and LRCLK are generated internally from MCLK, depending on the status of 
the CMODE input pin, as shown in Table 8-4. 
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Table 8-4. Master-Clock-to-Sample-Rate Conversion 


MCLK Sample Rate 
(MHz) CMODE SCLK (MHz) (kHz) 

12.288 Low 3.072 48 

18.432 High 

11.290 Low 2.8224 44.1 

16.934 High 

8.129 Low 2.048 32 

12.288 High 

0.256 Low 0.064 1 

0.384 High 


The ’C30 uses two bidirectional serial ports; the C31 and ’C32 each have one. 
Each serial port controls six port pins for receiving/transmitting data: 
FSR/FSX, CLKR/CLKX, and DR/DX. Figure 8-10 shows the glueless inter- 
face to the TLC320AD58C using the SCLK, FSYNC, and DOUT signals. Mode 
“100” is set by pulling the MODE1 and MODE2 pins low and the MODEO pin 
high. The master clock is derived from the ’C3x to make sure all clock signals 
are synchronized. The ’C8x is running at 49.152 MHz and provides the 
required MCLK frequency of 12.288 MHz at the timer 0 output pin in order to 
get a 48-kHz sample rate. CMODE must be pulled low. If other sample rates 
are required, see Table 8-4. 


The TLC320AD58C analog function blocks are initialized together with the 
DSP by a system reset after all supply voltages are stable. The digital function 
blocks are initialized by pulling down DIGPD for several microseconds. After 
the rising edge of DIGPD, the device resumes normal operation. When DIGPD 
is low, the TLC320AD58C digital function blocks are shut down and power con- 
sumption is reduced. However, if power down mode is not required, this signal 
can be tied to ANAPD. In both cases, refer to the Tl Data Acquisition Circuits 
Data Book for setup timing requirements. All digital inputs and outputs of the 
’°C3x and the TLC320AD58C are 5-V TTL compatible. To reduce ringing and 
overshot, a serial damping resistor (50 Q) is recommended for the master 
clock signal. 
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Figure 8-10. Interface Between the-TMS320C3x and the TLC320AD58C 
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RESET 
ANAPD RESET 
12.288 MHz 
. MCLK |-¢ TOUTO 
VSS ( : aoeeneed 
DIGPD ;-€ XFO 
CMODE 
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The ’C3x can be configured to receive a maximum of 32 bits of data per word. 
But, the TLC320AD58C transmits a total of 64 bits after the FSYNC pulse 
appears. This forces the DSP to read the left and right channels back-to-back. 
To accomplish this, the ’C3x serial port configuration is toggled between con- 
tinuous mode and burst mode. In burst mode, FSYNC indicates the start of a 
new data transfer. In continuous mode, the new data transfer starts immedi- 
ately after the last bit of the previous transfer has been shifted out. Both the 
serial port and the timer registers are memory mapped. Eight memory- 
mapped registers are provided for each serial port: 


[J One global control register—defines the serial port configuration 


_} Two control registers—set the function of the CLKX/CLKR and FSX/FSR 
pins 


Lj Three receive/transmit timer registers 


1 One data receive register 


J One data transmit register 


If the serial port shift clock (CLKR/CLKX) is generated externally, the corre- 
sponding timer can be used as a general-purpose timer. See the TMS320C3x 
User’s Guide for more information on the ’C3x serial port. 


TLC320AD58 Interface to the TMS320C3x 


Example 8-7 shows the C code for interfacing a TLC320AD58 to the ’C3x. 
Example 8-8 (page 8-36) shows the header file for the C code of 
Example 8-7. Example 8-9 (page 8-38) shows the interrupt table vector list- 
ing. These examples perform the following tasks: 


(1 Initialize the TLC320AD58C and the ’C30 serial port 1 to meet the 
TLC320AD58C serial interface timing requirements 


Lj Set up the timer O period register to generate the required MCLK 
frequency 


On a serial port 1 receive interrupt, which occurs after receiving 32 bits from 
either the left channel or right channel, the program reads from the serial port 
receive register and converts the input signal into a floating-point number with- 
in the range of —1.0 and 1.0. It then changes the serial port configuration from 
burst to continuous mode when the right channel has been received, or from 
continuous to burst mode when the left channel has been received. The trans- 
mit port is configured as the receive port for connection to the 18-bit 
TMS57014A stereo DAC. Remember that the data has to be written to the data 
transmit register no later than three CLKX cycles before the FSYNC pulse 
occurs (in burst mode) or the next transfers starts (in continuous mode). 


Example 8-7. Interfacing the 18-bit TLC320AD58 to TMS320C3x 


[BORK KK IK OK KK IK A IR KA A IA A A A AA AA AA A A A A A AA A A A A I I IO / 


/* File: AD58. C */ 
/* interfacing the 18-Bit TLC320AD58 to TMS320C3x */ 


[BORK KK IK OK KK AK AK A A A RA AA AA A AAA AA AAA A A A A I OK / 


/*include files */ 
Ee ard 
#include “vectors.h” 
#include “c3x.h” 


/* global variables */ 


f[¥snnaeaesaaaeaasase */ 
float Ichannel; 

float r_channel; 

[*------ -- --- - ----*/ 
/* main program A] 
Je ea ae ee ee Rae a aaa eee ee ee ae aaank/ 


void main(void) 
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Example 8-7. Interfacing the 18-bit TLC320AD58 to TMS320C3x (Continued) 


asm (” ldi 1000h, ST”); /* clear and enable cache */ 
asm (” ldi Oh, IE”); /* clear all interrupt masks*/ 
asm(” ldi Oh, IF”); /* clear all pending interrupt*/ 
inte 0 ()s /* Generate AD58 MCLK, if required */ 
inet sl) Fs /* Initialize serial port 1 */ 
init_ad58(); 
asm (” ldi _ERINT1_CPU,IE”); /* enable serial port 1 receive int */ 
asm (” or _GIEBIT,ST:); /* global enable interrupts */ 
while(1); /* wait on interrupt */ 
} 
ee a ee */ 
/* Subroutine to initialize Serial Port 1 to communicate with TLC320AD58 */ 
Re a a a Se */ 
void init_sl (void) 
{ 
serial_port[1l] [X_PORT] = X1_MODE; 
serial_port[1] [R_PORT] = R1_MODE; 
serial_port[1] [GLOBRL] = S1_CONFIG; 
} 
GH Li a */ 
/* Subroutine to initialize Timer 0 to generate TLC320AD58 MCLK xy 
fp We ee ee ee ee ee ee A */ 


void init_t0 (void) 


{ 


timer[0] [GLOBAL] = TO_HOLD; 

timer[0] [T_COUNTER] = 0X0; 

timer[0][T_PERIOD] = TO_PERIOD; 

timer[0] [GLOBAL] = TO_HOLD; 
} 
PRs le a es */ 
/* Serial Port Receive Interrupt Service Routine Fo 
[Rea eee oe eee eee ee ee ee ee ee ee ee es */ 


void c_int08 (void) 

{ 

/* reconfigure serial port to receive both channels within one frame sync */ 
if (serial_port[1] [GLOBAL] & 0x0C00) 
{ 


/* read LEFT channel and normalize within -1.0..1.0 */ 


1_channel = ((float) (serial_port[1] [R_DATA] >> 14))/(4.0*65536); 
/* switch to burst mode*/ 
serial_port[1] [GLOBAL] = serial_port[1] [GLOBAL] & OxFFFFF3FF; 


/* if transmitting to DAC, make sure to write to the transmit register no 
later than 3 SCLK=CLKX cycles before the rising edge of FSYNC */ 
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Example 8—7. Interfacing the 18-bit TLC320AD58 to TMS320C3x (Continued) 


else 


{ 
/* read RIGHT channel and normal 


r_channel = ((float) (serial_por 


/* switch to continuous made */ 


ize within -1.0..1.0 */ 
t[1] [R_DATA] >> 14))/4.0*65536 


serial_port[1l1[GLOBAL] = serial_port[1] [GLOBAL] | 0x0Cc00; 


/* if transmitting to DAC, make sure to write to the transmit register no 


later than 3 SCLK=CLKX cycles 


before the next transfer */ 


Gi a a le a a ee */ 
/* Subroutine to initialize TLC320AD58 ty, 
a ees es a eae a - _—_ - ae, 
void init_ad58 (void) 
{ 

asm(” ldi 0010b, IOF”); /* reset XF0O, power down AD58 */ 

asm (” rpts 2500 ie) a /* wait for 100 usee before */ 

asm (” nop so) ee /* asserting DigPwd */ 

asm(” ldi 0110b, IOF”) ; /* AD58 normal operation */ 
} 

Analog Interface Peripherals and Applications 8-35 


TLC320AD58 Interface to the TMS320C3x 


Example 8-8. C3x.h, Header File Listing 


Rae ase Soe oe ee a 

/ 

as FILE: C3X.H 

af, 

[* TMS320C3X CONTROL REGISTER SETTINGS TO SETUP INTERFACE WITH 

4 / 

[* TLC320AD58 18 BIT MASTER MODE 

*/ 

/*- ee cai eis seh sia a maa i a (i a hh ms ei ini i. _. a a eh ch i fi se i, a a in ns a eh ahem a Hn ns id = * 

/ 

re en a a ne * / 

/* Serial Port 1 Initialization */ 

[Pe xeeteniase sees se chaacseease ess */ 

#define X1_MODE 0x000000111 /* FSX/DX/CLKX are serial port pins */ 

#define Rl MODE 0x000000111 /* FSX/DX/CLKX are serial port pins */ 

#define S1_CONFIG 0x00EBC3C00 /* SerialPort Configutration a 
/* FSX/FSR input * / 
/* FSX/FSR signals active high */ 
/* external CLKX/R */ 
/* CLIM/CLKR active low * / 
/* fixed data rate mode * / 
/* 32-bit data width Bad 
/* TX/RX interrupts are enabled aA 
/* XRESET/RRESET set to O aA 
/* (take out of reset) */ 

[Pee Sesese secs sens essences —*/ 

/* Timer 0 Initialization * / 

Jie eee */ 

/T TOUT Frequency (clock mode) = 1/[8*CLKIN*TO_PERIOD], if TO_PERIOD period>0 

yf 

/* = 1/[4*CLKINI. if TO_PERIOD period ; 0 */ 

#define TO_PERIOD 0 /* TOUTO = 12.288 MHz for 49.152 MHz CLKIN xy 

#define TO_HOLD 0x0301 /* clock mode, 50% duty cycle */ 

#define TO_GO 0x03C1 

/¥eeoessensaaaeecse=s */ 

/* Interrupt Mask xf 

[¥otesecseassseeesaee */ 

asm(”_ERINT1_CPU .set 80h:); /* enable serial port 1 receive int */ 

asm(”_GIEBIT) set 2000h”); /* global enable interrupts */ 
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Example 8—8.C3x.h, Header File Listing (Continued) 


/* —— ee eee ase. = — ——— a _ = _ —— _ sana % 
/ 
f* TMS320C3X CONTROL REGISTER LOCATIONS 
*/ 
/* a a ee ee ee ee eS * 
/ 
/* Seeaeenaeeoee eee xf 
/* Serial Ports */ 
[*== an */ 
/* SERIAL PORT BASE LOCATION */ 
volatile int (*serial_port) [16 = (volatile int (*)[16]) 0x808040; 
/* SERIAL PORT CONTROL REGISTERS */ 
#define GLOBAL 0 /* GLOBAL CONTROL a 
#define X_PORT 2 /* TRANSMIT CONTROL */ 
#define R_PORT 3 /* RECEIVE CONTROL */ 
Rdefine X_DATA 8 /* TRANSMIT DATA x7 
#define R_DATA 12 /* RECEIVE DATA */ 
/* ee */ 
/* Timer */ 
/* - --- === / 
/* TIMER BASE LOCATION */ 
volatile int (*timer) [16] = (volatile int (*)[16]) 0x808020; 
#define T_COUNTER 4 
#define T_PERIOD 8 
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Example 8-9. TMS320C3x Interrupt Vector Table Listing 


[ieee es eS a ee ee Se ey) 
/* Filename: vectors.h Defines interrupt vectors and trap vectors a7 
em for C programs */ 
/* yf 
/* Usage: #include vectors.h */ 
/* 
/* Modifications: If you add interrupt service routines, modify x] 
/* this file to insert the vectors at the proper mf 
/* location in the vector table. */ 
/¥oasnsacSsn55 5 esas saan se aan ss ese ssn se esses sess Sees 55554 5535-544 55045555 */ 

asm (” -global _c_int00 o) 3 

asm (” -global _c_int08 ©) 3 

asm(” .sect \"vectors\” ys 

asm (”RESET -word _c_int0O0O ; external RESET- ys 

asm ("INTO -word _c_int99 ; external INTO- ue ee 

asmi ("“INT1 -word _c_int99 ; external INT1- aie a 

asm("”INT2 -word _c_int99 ; external INT2- ie 

asm("”INT3 -word _c_int99 ; external INT3- ) 3 

asm(”XINTO -word _c_int99 ; Serial port O XMT ”); 

asm(”RINTO -word _c_int99 ; Serial port 0 RCV ”); 

asm("”XINT1 -word _c_int99 ; Serial port 1 XMT "”); 

asm(”RINT1 -word _c_int08 ; Serial port 1 RCV "”); 

asm(”TINTO -word _c_int99 ; Timer 0 a 

asm("”TINT1 -word _c_int99 ; Timer 1 an 

asm(”DINT -word _c_int99 ; DMA complete eB 

asm (” .-space 20 ; Reserved space ie 

asm (”TRAPO a ae 

asm (” -loop 28 ; TRAPS 0-27 are oye 

asm (” -word _c_int99 ; undefined traps a 

asm (” -endloop a 

asm (” -space 4 ; TRAPS 28-31 reserved”); 
[Resse Ss So ee SS es Ss he SS Se Se Se es */ 
/* NOTE: Put all interrupt handlers AFTER this next statement! */ 
Le a): 
peoeee tenes est een sees te eee ee See ae aoe a= ee ee eee */ 

asm (” -text ye 
void c_int99() { } /* Spurious interrupt handler */ 
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8.6 CS4215 Interface to the TMS320C3x 


Figure 8-11 shows how to interface the ’C3x with zero glue logic to Crystal 
Semiconductor’s CS4216 16-bit stereo codec. 


Figure 8—11. TMS320C3x-to-CS4216 Interface 


C3x kQ CS4215 

SDOUT 

|_____»} SDIN 

@ SCLK 

FSX @ f FSYNC 
FSR TSIN 
XFO }_———_>}}_ DIC 

TCLK |] RESET 


Example 8-10 through Example 8-16 show the assembly and C language 
codes with their respective header files that program and interface the ’C3x to 
the CS4215. Example 8-10 shows the CS4215 driver interrupt vector table. 
Example 8-11 (page 8-41) shows the ’C3x serial port transmit interrupt 
service routine. Example 8-12 (page 8-44) and Example 8-13 (page 8-46) 
display the C code header files. Example 8-14 (page 8-47) shows the C 
language common driver routines. Example 8-15 (page 8-49) is the C code 
header file for Example 8-16 (page 8-59), which displays the C language 
driver routines for the CS4215. 


These files can be downloaded from Texas Instrument’s BBS or ftp site (file- 
name C3x4215.EXE). 
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Example 8-10. vecs.asm 


ip 


~ 


Vv. 


+ + +  F HM 


intOd: 
int Ls 
int! 
int3: 


dint: 


(C) 


- INT] 


resect 


xint0: 
rinto: 
xintl: 
rintl: 
tintod: 
tintl: 


(C) 


p RRR RRR RRR KK RK KR RK KR KR KK RK KR RK KK KKK KKK KKK KR KK RK RE KK KEK KEK KE KK KEK KE 


vecs.asm 


sta 


ff 


01-03-92 


Texas Instruments Inc., 1992 


Refer to the file ’license.txt’ included with this 
this package for usage and license information. 


C3x - CS4215 DRIV 


KKK KKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK 
KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KK KK KKK 


ECS.ASM 


* 


ER 


INTERRUPT VECTOR TABL 


3 


1991 TEXAS INSTRUMENTS, HOUSTON 
IR IR IR I I IR IR IR I I I I I IR IR a ok 


+ F F 


Sect 


-ref _ 
-ref _' 
-ref _' 
-ref _ 


.word 
.word 
.word 
.word 
.word 
.word 
.word 
.word 
.word 
.word 
.word 
.word 


"vecs” 


c_int00 
c_int06 
c_int08 
c_int99 


_c_int00 


_c_in 
fae ofa 9) 


t99 
t99 


cunt 99 


_c_in 
—CUin 


E99 
t99 


_c_int06 


216 
_c_in 


t99 
t08 


_c_int99 
-Curnt99 


_c_in 


t99 


ERRUPT AND RESET VECTORS . 
KOK KK KK KK KK KK OK OK KK KK KK KK KK KK EK RK KK 


; interrupt and reset vectors 


7 compiler defined C initialization reset 

; serial port transmit interrupt service routine 
; serial port transmit interrupt service routine 
7 unexpected interrupt handler 
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CS4215 Interface to the TMS320C3x 


PRR RRR RRR K KKK KKK KEK KK KK KKK KEK KK RK KKK KEK KKK KKK KE KKK RK KKK EK KKK 


i c_int.asm 

7 

; Leor Brenman 
7 

7 03=16=92 


i (C) 


: Refer to the file 


KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK 


C_INT0O8 (VOID) 


Hand-coded assembly language interrupt service routine. 
This serial port transmitt ISR supports the CS4215 zero 


This ISR has been hand-coded for speed optimization. 


Leor Brenman, 
(C) 


DSP Applications 
1991 TEXAS INSTRUMENTS, 


, 

* 

* 

* 

* 

*x 

* chip I/F to the C3x serial port 
* 

* 

* 

* HOUSTON 
* 


KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK 


-globl _c_int08 


KKKKKKKKK KKK KKK KKK 


* global variables 
KKEKKKKKKKKKKK KKK KK 
-global _first_half, 
-global _buffer_index, 
-global _output_xferl, 
-global _buffer_rdy, 


input0, 


KKKKKKKKK KKK KKK KKK 


* global variables 
KKKKKKKKKKKKKKKKKK 


.data 
.word 


n 
ica 
wy 
RR 


808050h 


KKK KKK KKK KK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK 


* FUNCTION DEF _c_int08 
KKK KK KKK KKK KK KKK KK KKK KKK KEK KKK KK KKK KKK KKKKKKKK KK KKK KEKE 
text 
_c_int08: 
PUSH ST 
PUSH RO 
PUSHF RO 
PUSH ARO 


Texas Instruments Inc. 


‘license.txt’ 


; this package for usage and license information. 
eC KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKKKKKKKKKKAK KKK KK KKK KK 


input_xfer0, 
_output_xfer0d 
_outputo0, 


;place in same page as 
;to eliminate push/pop of DP when loading 
;serial port one’s base address 


7 1992 


included with this 


_input_xferl, _buffer_size 


_outputl, _data_control 


_inputl 


.bss 
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Example 8-11. C_int.asm (Continued) 


KKK KKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK 


* if this is the first half of the transmission then goto FRST_HALF 
KKK KK KKK KK KKK KKK KEK KKK KK KKK KK KKK KK KKK KKK KEK KKK KKK KK KKK KKK KKKAKKKKK KK KK 


LDI @_first_half,RO 
BNZ FRST_HALF 


KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KK KKK KKK KKK KK 


* else, this the second half of the transmission 
KKK KK KKK KK KKK KK KKK KKK KKK KEK KKK KK KKKKKKKAK KKK KKK KKK 


SCND_HALF: 


KKK KKK KKK KKK KKK KK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK 


* load ARO with serial port base address 


* do dummy read of serial port to empty control info from serial port 
KKK KK KKK KK KKK KKK KEK KKK KKK KEK KKK KK KKK KK KKK KKK KK KKK KEK KKK KKK KKK KKKKKKKKKKK 


LDI @SER_1,ARO 
LDI *+ARO (12) ,RO 


KKK KKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KEK KKK KKK KK KKK KKK KKK 


* get control value and write to serial port while branching to end of ISR 
* and set first_half flag to TRUE 


KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KK KKK KKK KK KKK KKK KKK 


LDI @ data_control+1,R0 
BD FIN_S 

STI RO, *+ARO (8) 

LDI 1, RO 

STI RO,@_ first_half 


KKEKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKEK 


* This the second half of the transmission 
KKK KK KKK KK KKK KKK KK KKK KK KKK KKKKKKKK KKK KKK KK 


FRST_HALF: 


KKK KKK KKK KKK KKK KK KKK KKK KKK 


* push remaining registers 
KEKKKKKKKKKKKKKKKKKKKKKKKEK 


PUSH R1 
PUSHF Rl 
PUSH AR1 
PUSH IRO 
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Example 811. C_int.asm (Continued) 


KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK 


* set first_half flag to FALSE 


KKKKKKKKKKKKKKKKKKKKKKKKKKKKKEK 


LDI 0,RO 
STI RO,@_first_half 
C_int.asm 
POP ARO 
POPF RO 
POP RO 
POP ST 
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Example 8-12. General.h 


[BORK KK KK KK KK KK A A A A A A A A A A A A A I I / 


/* general.h v4.2 Bas 
/* Copyright (c) 1991 Texas Instruments Incorporated mee 
[BORK KK RK KK A A A A A A A A A A A OK I I  / 
#ifndef _GENERAL 
#define _GENERAL 


[RRR KK RK KR OR A A OR KK  / 


/* COMMON MACRO DEFINTIONS 

[KKK RK KK KK KK A A A A A A A A A A I I / 
ifndef OFF 

define OFF 0x00 

endif 


ifndef ON 
define ON 0x01 
endif 


tifndef FALS! 
define FALSI! 
endif 


7) 


0x00 


ifndef TRUE 
define TRUE 0x01 
endif 


#ifndef CLEAR 
define CLEAR 0x00 
endif 


ifndef SI! 
tdefine SI 
endif 


Fl E 


Ox01 
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Example 812. General.h (Continued) 


[ROKK RR KK KR KR KK KK KR RR OK KK OK RK RK OR OK OR OK OK OK OK KK OK OK OK KK / 


/* GENERAL C3x MACROS */ 
[RRR KR KK KR OK OK KK KR OK KK KK KK RK KR RR OK OR OR OK OK OK ORK OK OK OK KK / 
#ifndef INIT_XF_PINS 

#define INIT_XF_PINS asm(” LDI 00h, IOF”) 

#endif 


#ifndef CL_INT_FL_R 
#define CL_INT_FL_R 
#endif 


He 
QQ 


asm(” LDI Oh,IF”) 


#ifndef EN_GLOBAL_INTS 
#define EN_GLOBAL_INTS asm(” OR 2000h, ST”) 
#endif 


#ifndef EN_SER_PORT_XMT_INT_O 
#define EN XMT_INT_O asm(” OR 10h, IE”) 
#endif 


Wn 
Pe) 
‘U 
Oo 
Pe) 
HF 


#ifndef EN_SER_PORT_RCV_INT_O 
#define EN_SER_PORT_RCV_INT_O asm(” OR 20h, IE”) 
#endif 


#ifndef EN_SER_PORT_XMT_INT_1 
#define EN_SER_PORT_XMT_INT_1 asm(” OR 40h, IE”) 
fendif 


#ifndef EN_SER_PORT_RCV_INT_1 
#define EN_SER_PORT_RCV_INT_1 asm(” OR 80h, IE”) 
fendif 


#ifndef ENABLE CACHE 
#define ENABLE CACHE asm(” OR 800h, ST”) 
#endif 


#endif /* #ifndef _GENERAL */ 
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Example 8-13. Commadrvr.h 


[RRR KK KR KK KR KK KK KK OK OK KR RR OK RK RK OR RR OK OK OR OK OK KK KK KK / 


i COMMDRVR.H */ 
re */ 
is TMS320C3x -— COMMOM DRIVER HEADER FILE * / 
/* :TMS320C3x CODE */ 
{* Compile and archive into appropriate driver library */ 
iP */ 
/* (C) 1991 TEXAS INSTRUMENTS, HOUSTON */ 


[KKK RK KK KK KK KK RK A KA A A A A A A A A A AA A A A A A A A I I / 


#include <c30_per.h> 


[KKK RK KK KK KK IK KR A A A A A A A A AAA A A A AR A A A A A A A I I / 


/* COMMON STRUCTURES * / 


[RRR KK KR KK KR KK KK KK OK OK OK RK KR KR KK OR RK OR OK OK OK OK KK KK / 


typedef volatile int VI; 
typedef volatile float VF; 
typedef VF * volatile VPVF; 
typedef VI * volatile VPVI; 


[RRR KK KR KK KR KK KR KK OK KK KR KK KK KR OK OR OK OR OK OK OK OK KK KK KK / 


/* FUNCTION PROTOTYPES x 
[KK RK KK KK KK IK IR A A A A A A A A AA A AA AA A A A A A OA I A I I / 


void c_int99 (void); 

void heap_overflow(void) ; 

void init ¢e30 (vord) ; 

void error_in_real_time (void); 
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Example 814. Commarvr.c 


[% KK KK KK KK RRR RR RR RRA AAR AR A RR a ek oe 


commdrvr.c 
staff 
01-15-92 
(C) Texas Instruments Inc., 1992 


Refer to the file ’license.txt’ included with this 
this package for usage and license information. 


TR RR RR RR A RR AA RA AAA AAA A RA a a 
[KK HK KK KR RR AR AR AR AR AA A A A A AA A A A A A A A A A I I He / 


/* COMMDRVR.C =/ 
bes: */ 
i * TMS320C3x - COMMOM DRIVER ROUTINES */ 
/* :TMS320C3x CODE * / 
L* Compile and archive into aic.lib */ 
/* */ 
/* (C) 1991 TEXAS INSTRUMENTS, HOUSTON */ 


[1% KK KK KK RR RR RRR AR A RAR AAA RA AAR AA A A a / 


#include <commdrvr.h> 


J % KK KK KK RK RR RR RRR RRA RAR RAR AR AA A a a oe / 


/* C_INT99(): ERRONEOUS INTERRUPT SERVICE ROUTINE *: 


L* THIS ROUTINE IDLES AFTER RECEIVING AN UNEXPECTED INTERRUPT */ 
[FORCE III II II IC IOI II ICI ICI IOI ICI IC IO III IO IO I IR ICR I A I A Ik tk / 


void c_int99 (void) 
{ 
for(;;); 


} 


[1% KK KK KKK RK RK RR RRA A RAR AA AAA A A A a a / 


/* HEAP_OVERFLOW(): NOT ENOUGH MEMORY IN THE HEAP */ 
[% THIS ROUTINE IS AN ERROR HANDLER FOR WHEN MEMORY */ 
[> CANNOT BE ALLOCATED FROM THE HEAP * 


[8 KK KK KK RR RAR A A A A A A A A AR A A A A A A A A A A A A A I I He / 


void heap_overflow (void) 
{ 
for (+7); 


} 


J % KK KK KK RRR RR RR AR AA AR AR AA AA AA AA A a a a / 


/* INIT_C30(): INITIALIZE TMS320C30 if 
[OI IC IO IC II I IO III III IO II I IC IOI II IOI IO I II I IO I ICI I IO A A / 


void init_¢30 (void) 
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Example 8-14. Commarvr.c (Continued) 


BUS_ADDR->exp_gcontrol = 0x0; 
BUS_ADDR->prim_gcontrol = 0x0; 
INIT_XF_PINS; 


ENABLE_CACHE; 


} 


[RRR KK KK KK KK A A A A A A A A A A A I I / 
/* ERROR_IN_REAL_TIME(): ERROR HANDLER, PROCESSING TIME IS GREATER */ 
I/O TIME. * / 


/* 
[BRK RR KR KK KR KK KR A A A OR KK / 


void error_in_real_time (void) 
{ 
for(s7); 


} 
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[BRK KK KK KK KK IKK A A A A A A A AR A AR A A A AAA A A A OR I / 
i CS4215.H */ 
Le xf 
/* TMS320C3x - CRYSTAL 4215 MM CODEC ay 
[* :TMS320C3x CODE */ 
hal af 
/* Leor Brenman, DSP Applications */ 
/*  (C) 1991 TEXAS INSTRUMENTS, HOUSTON a 
[BRK KK KK KK KK KK A A A A A AA A AA AAA A A A AA AA A OR OR I I / 
#include <math.h> 
#include <stdlib.h> 
#include <c30_per.h> 
#include <commdrvr.h> 
pe */ 
/* MACROS * ay 
[* «if 
#define BLOCK_SIZE 64 
#define SER_NUM SERIAL _PORT_ONE 
#define TIMER_NUM TIMER_ONE 
#define XF_NUM 1 
#define INIT_ARRAYS init_arrays (buffer_size) 
#define WAIT_BUFFERS while (!buffer_rdy) ; 
#define RESET FLAGS buffer_rdy = FALSE 
#define RESET_CODEC TIMER_ADDR(TIMER_NUM) ->gcontrol = I_O | HLD_ 
#define UN_RESET_CODEC IMER_ADDR(TIMER_NUM) ->gcontrol = I_O HLD_ | DATOUT 
#if XF_NUM 
#define DCB_LOW asm(” AND 2fh,IOF”); asm(” OR 20h, IOF”) 
#define DCB_HI asm (” OR 60h, IOF”) 
#else 
#define DCB_LOW asm (” AND OF2h,IOF”); asm(” OR 2h, I10OF”) 
#define DCB_HI asm(” OR 6h, IOF”’) 
#endif 
#define WAIT (A) for (i=0;i<A;it++); 
#define C_ISR ON 
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Example 8-15. CS4215.h (Continued) 


[RRR KK KK KK KK A A A A A I A A A A A A I I I / 
/* CS4215 DATA COMMAND BIT FIELD DATA STRUCTURES ay: 
[RRR RK KR KK KK OR OR KK / 
[BRK RK KK KK KR KR A A OO OR KK / 
[* CONTROL COMMAND */ 
[BRK RK KK KK KK KA A A A I A A A A A A I I / 
typedef union 
{ 

unsigned int _intval[2]; 

struct 

/* Time slot 4 */ 


unsigned int adl vile /* Loopback mode *y 
unsigned int enl ele /* Enable loopback testing */ 
unsigned int d_r5 26; /* Unused - don’t care bits: 2 - 7 */ 
/* Time slot 3 */ 

unsigned int xen is /* Transmitter enable */ 
unsigned int xclk tall /* Transmit clock By 4 
unsigned int bsel 25 /* Select bit rate yf 
unsigned int mckf 2; /* Clock source select */ 
unsigned int d_r4 2; /* Unused - don’t care bits: 6 - 7 */ 
/* Time slot 2 */ 

unsigned int df 25 /* Data format selection */ 
unsigned int st 1; /* Stereo bit: O-mono, 1-stereo */ 
unsigned int dfr £3 /* Data conversion freq selection */ 
unsigned int d_r3 2; /* Unused - don’t care bits: 6 - 7 */ 
/* Time slot 1 */ 

unsigned int d_rl +25 /* Unused - don’t cares bits: 0 - 1 */ 
unsigned int dcb sis /* Data control handshake bit */ 
unsigned int d_r2 353 /* Unused - don’t cares bits: 3 - 7 */ 
/* Time slot 8 */ 

unsigned int d_r9 8; /* Unused —- don’t care bits: 0 - 7 */ 
/* Time slot 7 */ 

unsigned int rv oa /* Revision level of the CS4215 */ 
unsigned int d_r8 24; /* Unused - don’t care bits: 4 - 7 */ 
/* Time slot 6 */ 

unsigned int d_r7 28; /* Unused - don’t care bits: 0 - 7 */ 
/* Time slot 5 */ 

unsigned int d_r6 26; /* Unused —- don’t care bits: 0 - 5 af 
unsigned int pio oe /* Parallel port control */ 

} _bitval; 
} CONTROL; 
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Example 8-15. CS4215.h (Continued) 


[8K KK KK IK KK OK I I I I A A A I A A A a ee / 
/* DATA COMMANDS iad 
[OK KK KK HK HK KK I IK I A I I I A A A A A IA A A A A A A A A A A A I I oe a / 
typedef union 
{ 
unsigned int _intval[2]; 
struct 
{ 
/* Time slots 3 & 4 */ 
signed int right 216; /* Right channel 16 bit * / 
/* Time slots 1 & 2 */ 
signed int left :16; /* Left channel 16 bit */ 
/* Time slot 8 */ 
unsigned int rg 24; /* Right input gain settings 7, 
unsigned int ma 34; /* Monitor path selection */ 
/* Time slot 7 */ 
unsigned int lg 4; /* Left input gain settings Ki 
unsigned int is el /* Input selection */ 
unsigned int ovr ds /* Overange */ 
unsigned int pio 2; /* Parallel I/O bits */ 
/* Time slot 6 */ 
unsigned int ro 26; /* Right output attenuation setting 7. 
unsigned int se a Be /* Speaker output enable control * 
unsigned int d_rl Baler /* Unused - don’t care bit 7 */ 
/* Time slot 5 */ 
unsigned int lo 367 /* Left output attenuation setting yf 
unsigned int le cis /* Parallel output enable control */ 
unsigned int he s1 /* Headphone output enable control */ 
} _bitval; 
} STEREO_16; 
typedef union 
{ 
unsigned int _intval[2]; 
struct 
{ 
/* Time slots 3 & 4 */ 
signed int d_rl 716; /* Unused - don’t care bits 0 - 15 #7; 
/* Time slots 1 & 2 */ 
signed int left :16; /* Left channel 16 bit af 
/* Time slot 8 */ 
unsigned int d_r3 A? /* Unused - don’t care bits: 0 - 3 */ 
unsigned int ma 34; /* Monitor path selection */ 
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Example 8-15. CS4215.h (Continued) 


/* Time slot 7 */ 
unsigned int lg 
unsigned int is 
unsigned int ovr 
unsigned int pio 


/* Time slot 6 */ 
unsigned int ro 
unsigned int se 
unsigned int d_r2 


/* Time slot 5 */ 
unsigned int lo 
unsigned int le 
unsigned int he 
} _bitval; 
} MONO_16; 


typedef union 

{ 
unsigned int _intval[2]; 
SCEUct 


{ 
/* Time slots 4 */ 
signed int d_r2 


/* Time slot 3 */ 
signed int right 


/* Time slots 2 */ 
signed int d_rl 


/* Time slot 1 */ 
signed int left 


/* Time slot 8 */ 
unsigned int rg 
unsigned int ma 


/* Time slot 7 */ 
unsigned int lg 
unsigned int is 
unsigned int ovr 
unsigned int pio 


/* Time slot 6 */ 
unsigned int ro 
unsigned int se 
unsigned int d_r3 


/* 


/* 


/* 


/* 


Left input gain settings 
Input selection 

Overange 

Parallel I/O bits 


Right output attenuation setting 
Speaker output enable control 
Unused - don’t care bit 7 


Left output attenuation setting 
Parallel output enable control 
Headphone output enable control 


Unused - don’t care bits 0 - 7 


Right channel 8 bit 


Unused - don’t care bits 0 - 7 


Left channel 8 bit 


Right input gain settings 
Monitor path selection 


Left input gain settings 
Input selection 

Overange 

Parallel I/O bits 


Right output attenuation setting 
Speaker output enable control 
Unused - don’t care bit 7 


a 


aid 


*/ 


a 
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/* Time slot 5 */ 
unsigned int lo 267 /* Left output attenuation setting */ 
unsigned int le iy /* Parallel output enable control */ 
unsigned int he oly /* Headphone output enable control * ff 
} _bitval; 
} STEREO_8; 
typedef union 
{ 
unsigned int _intval[2]; 
struct 
{ 
/* Time slots 2 - 4 */ 
signed int d_rl 224; /* Unused - don’t care bits 0 - 23 *«/ 
/* Time slot 1 */ 
signed int left 2os /* Left channel 8 bit #7; 
/* Time slot 8 */ 
unsigned int d_r3 pas /* Unused - don’t care bits: 0 - 3 «7 
unsigned int ma 24; /* Monitor path selection */ 
/* Time slot 7 */ 
unsigned int lg 24; /* Left input gain settings */. 
unsigned int is ¢1s /* Input selection */ 
unsigned int ovr £13 /* Overange */ 
unsigned int pio 2; /* Parallel I/O bits */ 
/* Time slot 6 */ 
unsigned int ro 267 /* Right output attenuation setting */ 
unsigned int se 21; /* Speaker output enable control */ 
unsigned int d_r2 “le /* Unused - don’t care bit 7 */ 
/* Time slot 5 */ 
unsigned int lo 2637 /* Left output attenuation setting */ 
unsigned int le ele /* Parallel output enable control */ 
unsigned int he Sy /* Headphone output enable control tof, 
} _bitval; 
} MONO_8; 
typedef union 
{ 
unsigned int _intval[2]; 
CONTROL control; 
STEREO_16 stereo_16; 
MONO_16 mono_16; 
STEREO_8 stereo_8; 
MONO_8 mono_8; 
} CS4215_ WORD; 
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Example 8-15. CS4215.h (Continued) 


L* ay: 
/* GLOBAL VARIABLES * xf 
/* af 
extern int buffer_size; /* SIZE OF I/O BUFFER (S) */ 
extern VPVF output0; /* OUTPUT DATA BUFFER FOR PROCESSOR * 
extern VPVF input0; /* INPUT DATA BUFFER FOR PROCESSOR */ 
extern VPVF output_xfer0; /* OUTPUT DATA BUFFER FOR ISR/AIC */ 
extern VPVF input_xfer0; /* INPUT DATA BUFFER FOR ISR/AIC * / 
extern VPVF outputl; /* OUTPUT DATA BUFFER FOR PROCESSOR */ 
extern VPVF inputl; /* INPUT DATA BUFFER FOR PROCESSOR */ 
extern VPVF output_xferl; /* OUTPUT DATA BUFFER FOR ISR/AIC */ 
extern VPVF input_xferl; /* INPUT DATA BUFFER FOR ISR/AIC Ey 
extern VI buffer_rdy; /* CPU-ISR COMM FLAG (INPUT) */ 
extern VI buffer_index; /* INDEX INTO INPUT AND OUTPUT DATA ARRAYS */ 
extern VI i; /* GENERIC COUNTER VARIABLE */ 


extern VI first_half; 


extern CS4215_ WORD data_control; 


[RR OK KK KK KK KR A A A A A A A RK KE / 


[re FUNCTION PROTOTYPES a 
[KKK KK KK KK KK A A A A A I A A A A A A I I I 


[RK KK KK KK KKK KK KK KK KK / 
/* CS4215 DRIVER FUNCTIONS */ 


[RRR KK KK KK KK KK OK RK OK KK KK OK / 


void init_arrays(int buffer_size); 

void init_4215(int crystal, int sample_rate); 
#if£ SER_NUM 

void c_int07 (void); 

#else 

void ¢_int05 (void)> 

#endif 


[BRK KK KK RR KK KK A A A A A A RK OK / 


Le CS4215 DATA COMMAND BIT FIELD MACROS #7 
[BRK KK RK KK KK KK A I A A A A A A A A A A I I / 


[RK KK KK KR KK KR KK A A A A A A OO KK / 


Ke CONTROL COMMAND MACROS * / 
[BRK KK RK KK KK A A A A A A A A A I I / 
#define DATA al 

#define COMM 

#define SIXTEEN_BIT_LINEAR 
#define EIGHT_BIT_U_LAW 
#define EIGHT_BIT_A LAW 
#define MONO_MODE 
#define STEREO_MODE 


FONFOOG 
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Example 8-15. CS4215.h (Continued) 


/* Data conversion Frequency Selections Assumes that XTAL1 = 24.576 MHz */ 


/* And XTAL2 = 16.9344 MHz. */ 
/* XTAL1 (kHz) XTAL2 (kHz) */ 
7% */ 
#define CONV_FREQ_0 0 /* 8.00000 | 5.5125 */ 
#define CONV_FREQ_1 1 /* 16.00000 | 11.0250 */ 
#define CONV_FREQ_2 2 /* 27.42857 | 18.9000 */ 
#define CONV_FREQ_3 3 /* 32.00000 | 22.0500 */ 
#define CONV_FREQ_4 4 /* NA | 37.8000 */ 
#define CONV_FREQ_5 5 /* NA 44.1000 */ 
#define CONV_FREQ_6 6 /* 48.00000 | 33.0750 */ 
#define CONV_FREQ_7 7 /* 9.60000 | 6.6150 */ 


#define CS_ENABLE 0 /* Data output enabled */ 
#define CS_DISABLE 1 /* Data output disabled*/ 
#define CS_TCLOCK_EXT 0 /* FSYNC and SCLK are inputs*/ 
#define CS_TCLOCK_INT i /* FSYNC and SCLK are outputs*/ 
#define BPF_64 0 /* 64 bits per frame */ 
#define BPF_128 1 /* 128 bits per frame */ 
#define BPF_256 2 /* 256 bits per frame */ 


#define CS_CLOCK_SCLK 0 /* Clock source select: SCLK */ 
#define CS_CLOCK_XTAL1 1 /* Clock source select: XTAL1*/ 
#define CS_CLOCK_XTAL2 2 /* Clock source select: XTAL2*/ 
#define CS_CLOCK_EXT 3 /* Clock source select: Ext */ 


#define DIGITAL_LOOPBACK 0 


#define ANALOG_LOOPBACK 1 
#define LOOP_ENABLE 1 
#define LOOP_DISABLE 0 


[KR KK KK KK KK KK KA A A A AA A A AA A AA AA AAA A A A A I I I / 


f* DATA COMMAND MACROS */ 
[ROR ICI III ICI III III II III III III III III I III RIOR IA I AO a I a ae / 


/* Output attenuation is 1.5 dB per unit integer value */ 
/* Attenuation (dB) */ 
eas */ 
#define ATT_O 0 has 0.0 */ 
#define ATT_1 1 /* 15: Bae 
#define ATT_2 2 /* 3.0 =f 
#define ATT_3 3 f* Ae */ 
#define ATT_4 4 {* 6.0 a7, 
#define ATT_5 5 f* 7.5 aA 
#define ATT_6 6 /* 9.0 Bair 
#define ATT_7 7 /* 10.5 * / 
#define ATT_8 8 /* 12.0 */ 
#define ATT_9 9 [* 13:45 */ 
#define ATT_10 10 Le 15.0 */ 
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Example 8-15. CS4215.h (Continued) 


#define 
#define 
define 
define 
#define 
#define 
define 
define 
#define 
#define 
define 
define 
#define 
#define 
#define 
define 
define 
#define 
#define 
define 
define 
#define 
#define 
define 
define 
#define 
#define 
#define 
define 
define 
#define 
#define 
define 
define 
#define 
#define 
define 
define 
#define 
#define 
#define 
define 
#define 
#define 
#define 
#define 
#define 
#define 
#define 
#define 
#define 


1 
ODAIHADUBWNEHE 


11 
12 
13 
14 
i 
16 
aT 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 


is 


SEP PHP BH HP HL 
AINAUBWNE 


ann nk 
WNHrROW 


au 
Ow 


DaouW UW UI 
rFOW ONO 


Tes 
18. 
19'. 
21. 
22. 
24. 
20% 
2s 
28. 
30. 
Sls 
335s 
34. 
36% 
31s 
39. 
40. 
42. 
43. 
45. 
46. 
48. 
49. 
51. 
52. 
54. 
D9. 
57. 
58. 
60. 
61. 
63. 
64. 
66. 
67. 
69. 
70. 
12: 
73. 
74. 
75. 
77. 
78% 
80. 
81. 
83. 
84. 
87. 
88. 
90. 
91. 


AononondnoNnodNnoWdNWoUWONWdNoeNWdoNWoUWdONWdHDOHWOUWWOHOW WOH OW OWOWO UI 
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Example 815. CS4215.h (Continued) 


#define ATT_62 62 /* 93.0 */ 
#define ATT_63 63 fe 94.5 */ 
#define HEADPHONE_OFF 0 

#define HEADPHONE_ON 1 

#define LINE_OUT_OFF 0 

#define LINE_OUT_ON 1 

#define SPEAKER_OFF 0 

#define SPEAKER_ON 1 

/* Input gain is 1.5 dB per unit integer value ay 
[* Gain (dB) ay: 
[* SSSSSSS5= * ff 
#define GAIN_O 0 7% 0.0 */ 
#define GAIN_1 1 /* 1.5 */ 
#define GAIN_2 2 /* 3.0 */ 
#define GAIN_3 3 /* 4.5 */ 
#define GAIN_4 4 is 6.0 ey 
#define GAIN_5 5 /* 7.5 */ 
#define GAIN_6 6 fe 9.0 */ 
#define GAIN_7 7 /* 10.5 */ 
#define GAIN_8 8 /* 12.0 */ 
#define GAIN_9 9 /* 13.5 */ 
#define GAIN_10 10 /* 15.0 */ 
#define GAIN_11 11 /# 16.5 */ 
#define GAIN_12 12 /* 18.0 tf 
#define GAIN_13 13 /* 19.5 */ 
#define GAIN_14 14 /* 21.0 caer 
#define GAIN_15 15 /* 22.5 */ 
#define LINE_IN 0 

#define MIKE_IN 1 

#define OVERANGE_ENABLE 1 

#define OVERANGE_CLEAR 0 

/* Monitor path attenuation = 6 dB per unit integer value af 
[* Gain (dB) */ 
ioe SSSSSSS5= “/. 
#define MATT_O 0 /* 6.0 */ 
#define MATT_1 1 fe 12.0 */ 
#define MATT_2 2 /* 18.0 */ 
#define MATT_3 3 /* 24.0 */ 
#define MATT_4 4 [* 30.0 Da 
#define MATT_5 5 /* 36.0 */ 
#define MATT_6 6 Te 42.0 */ 
#define MATT_7 ri /* 48.0 */ 
#define MATT_8 8 /* 54.0 */ 
#define MATT_9 9 [* 60.0 o/, 
#define MATT_10 10 /* 66.0 a 
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Example 8-15. CS4215.h (Continued) 


#define 
#define 
#define 
#define 
#define 


MATT 
MATT. 
MATT. 


aml 
he 
cS 


MA 


14 


MA 


15 


11 
I 
13 
14 
15 


2% 
78. 
84. 
90. 
96. 


(Mute Monitor Path) 
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CS4215 Interface to the TMS320C3x 


cs4215.c 
staff 
05-13-92 


[RRR KR KK KR KK OK OK OK OK OK OK OK OK OK OK OK OK KK OK OK OK OK OK OK KK KK OK 


(C) Texas Instruments Inc., 1992 


Refer to the file ’license.txt’ included with this 
this package for usage and license information. 


KK KK KK RK OK RR OK KR RK OR OK OK OK KK KK KK / 


Compile and archive into CS4215.lib 


DSP Applications 


#include <stdlib.h> 
#include <string.h> 
#include <cs4215.h> 


ENTS, HOUSTON 


/* CS4215.C 

/* 

[= TMS320C3x - CRYSTAL 4215 MM CODEC 
fe :TMS320C3x CODE 

/* 

/* 

/* Leor Brenman, 

[* (C) 1991 TEXAS INSTRUMI 

[RK KKK KK KK KK KKK 

#include <math.h> 


VPVF output0; /* OUTPUT DATA BUFFER FOR PROCESSOR 
VPVF input0; /* INPUT DATA BUFFER FOR PROCESSOR 
VPVF output_xfer0; /* OUTPUT DATA BUFFER FOR ISR/CODEC 
VPVF input_xfer0; /* INPUT DATA BUFFER FOR ISR/CEDEC 
VPVF outputl; /* OUTPUT DATA BUFFER FOR PROCESSOR 
VPVF inputl; /* INPUT DATA BUFFER FOR PROCESSOR 
VPVF output_xferl; /* OUTPUT DATA BUFFER FOR ISR/CEDEC 
VPVF input_xferl; /* INPUT DATA BUFFER FOR ISR/CODEC 
VI buffer_rdy = FALSE; /* CPU-ISR COMM FLAG (INPUT) 

VI buffer_index = 0; /* INDEX INTO INPUT AND OUTPUT DATA ARRAYS 
VI first_half = TRUE; 

VI 1s /* GENERIC COUNTER VARIABLE 
CS4215_WORD data_control; 

#if C_ISR 


[BRK KK KK KK KK KK A A A A A AA AA A A AA AR A A A A A A I I I / 


7 


a 


eR AK A A A A A A A A I A A A A A A A A A A A A a a 


[RRR KK KK KR KK KK KK KK KR KK RR OK OR KK OK OR OK OK KK KK OK OK KK KK / 


/* GLOBAL VARIABLES 
[BRK KK KK KK KK RK A A IR A A A A AA A A A AR A A A A A A OR I I I / 


int buffer_size = BLOCK_SIZE; /* SIZE OF I/O BUFFER(S) 


*/ 


Analog Interface Peripherals and Applications 


8-59 


CS4215 Interface to the TMS320C3x 


Example 8-16. CS4215.c (Continued) 


[KOR K OK OK OK OK OK OK OK OK OK OK OK 


/* C_INT06() OR C_INTO8() 
/* SERIAL PORT 0/1 RI 


FICK IO III IO I IC IO II II IO IOI IR IO II A IA I IOI A IA Ik a7 
se 
ECEIVE INTERRUPT SERVICE ROUTINI + / 


G va 


" 


ay ry 


[KOR OK OK OK OK OK OK OK OK OK OK OK OK 
#if£ SER_NUM 

void c_int06 (void) 
void c_int08 (void) 
#else 

void c_int08 (void) 
void c_int06 (void) 
#endif 

{ 


{} 


{} 


VPVF swap; 


CS4215_ WORD in,outs 

if(first_half) /* First 

first_half = FALSE; 
in._intval[0] = S 


FA RK RA A A A A AAA AA A A A A AO I OK / 


half of the 64 bit transmission */ 


ERIAL_PORT_ADDR (SER_NUM) ->r_data; 


input_xfer0[buffer_i 
input_xferl [buffer_i 


out.stereo_16._bitva 
out.stereo_16._bitva 
SERIAL PORT_ADDR (SI 


if (++tbuffer_index == 
{ 
swap 
input0 
input_xfer0 


swap 
inputl 
input_xferl 


swap 
output0 
output_xfer0 


swap 
outputl 
output_xferl 


buffer_index 
buffer_rdy 


ER_NUM) ->x_data 


ndex] 
ndex] 


= in.stereo_16._bitval.right; 
in.stereo_16._bitval.left; 


l.left 
l.right 


output_xferl [buffer_index]; 
output_xfer0 [buffer_index]; 
out._intval[0]; 


buffer_size) 


input0; 
input_xfer0; 
swap; 


input1; 
input_xferl; 
swap; 


output0; 
output_xfer0; 
swap; 


outputl; 
output_xferl; 


= swap; 


0; 
TRUI 


8-60 


CS4215 Interface to the TMS320C3x 


Example 816. CS4215.c (Continued) 


else /* Second half of transmission */ 
{ 
SERIAL PORT_ADDR(SER_NUM) ->r_data; 
SERIAL PORT_ADDR(SER_NUM) ->x_data = data_control._intval[1]; 
first_half = TRUE; 


} 
} 
#endif /* C_ISR */ 


[® */ 
/* INIT_ARRAYS(): INITIALIZE DATA ARRAY PARAMETERS my: 
has */ 


void init_arrays(int buffer_size) 


{ 


int 1; 

f hte oe ee ee */ 

/* INITIALIZE AND ZERO FILL ARRAYS ae 

[Re seee cee e sso cose she ce bens see see a ke oes eee sees ose See i= */ 

if(!(inputO = (float *) calloc(buffer_size, sizeof (float) ))) 
heap_overflow(); 

if(!(outputO = (float *) calloc(buffer_size, sizeof (float) ))) 
heap_overflow(); 

if(!(input_xferO = (float *) calloc(buffer_size, sizeof (float) ))) 
heap_overflow(); 

if(! (output_xfer0O = (float *) calloc(buffer_size, sizeof (float) ))) 
heap_overflow(); 

if(!(inputl = (float *) calloc(buffer_size, sizeof (float) ))) 
heap_overflow(); 

if(!(outputl = (float *) calloc(buffer_size, sizeof (float) ))) 
heap_overflow(); 

if(! (input_xferl = (float *) calloc(buffer_size, sizeof (float) ))) 
heap_overflow(); 

if(! (output_xferl = (float *) calloc(buffer_size, sizeof (float) ))) 
heap_overflow(); 


for(i = 0; i < buffer_size; i++) 

{ 
output0O[i] = output_xfer0[i] O08 
output1l[i] = output_xferl[i] = 0.0; 
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Example 8-16. CS4215.c (Continued) 


/* 
/* 
/* 


{ 


/* INIT_4215(): 
NOTE: 


void init_4215(int crystal, 


INITIALIZE 


i IS A VOLATI 
READS OF SERIAL 


TIME DELAYS AND TO FORCE 


A RECEIVE REGIST 


ER TO CLEAR 


[ORK KK HK KI KK I I A I A I A A I A A I / 


COMMUNICATIONS TO CS4215 
E TO FORCE 
PORT DA 


THE RECEIVE INT 


VI i, j,dummy; 


CS4215_WORD 


temp, in, out; 


RESET_CODEC; 


WAIT (50); 


/* 


ERRUPT FLAG 
[KKK KK KK I I I I I A A I A A A A A I e/ 


int sample_rate) 


+f 


/* RESET AIC */ 


KEEP 


R 


ESET LOW FOR SOME PERIOD OF TIME */ 


[KKK KK RR KK KR OK KK KK KK KK KK RR OR OK OK OK OK OR KK OK RK KK KK / 


/* CONFIGURE SERIAL PORT 1 
[ORICA III III II III II ICI II IA IA I ICR IA IE ak ak / 


SERIAL_PORT 
SERIAL_PORT 
SERIAL_PORT 
SERIAL_PORT 
/* THE FOL 
SERIAL_PORT_. 
SERIAL_PORT 


ADDR (SE 
"ADDR (SE 


_ADDR(S 


AA 


bl 


'_ADDR (SER_NUM) ->gcontrol 


NUM) ->s_x_control 
NUM) ->s_r_control 


R_NUM) ->s_rxt_control = 


ADDR (SER 


OWING PERIOD REGISTER VALU 


_ NUM) ->s_rxt_period_bit.x_period = 


0x0; 


CLKXFUNC | 
= CLKRFUNC | 


xGO | XHLD_ 


any 


HAS BEEN TEST 


0x3; 


ADDR (SER_! 


/* BUILD CONTROL WORDS */ 


NUM) ->gcontrol 


af 


DXFUNC | FSXFUNC; 
DRFUNC | FSRFUNC; 


| XxCP_ | XCLKSRC; 


ED ON A 50 MHz C30 */ 


XCLKSRCE | XLEN_32 | XFSM | RFSM 


RLEN_32 | XINT | RINT 


FSXOUT | RRESET | XRESET; 


/* ALL BITS ARE 0 EXCEPT THOSE DEFINED OTHERWISE */ 
temp._intval[0] = temp._intval[1] = 0; 
temp.control._bitval.st = STEREO_MODE; 
temp.control._bitval.dfr = sample_rate; 
temp.control._bitval.xclk = 1; 
temp.control._bitval.mckf = crystal; 
temp.control._bitval.pio = 3; 


/* BUILD DAT 


data_control 
data_control 


data_control 
data_control.stereo_16._bitval. 


data_control.stereo_16._bitval. 


[TA CONTROL WORD */ 
-_intval[0O] = 
l.stereo_16._bitval. 
l.stereo_16._bitval. 


data_control.stereo_16._bitval 


lo 
le 
ro 
ovr 


-mMa 


data_control._intval[1] = 


= ON; 


0; 
ATT 


ATT 
ON; 
MATT_15; 
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Example 816. CS4215.c (Continued) 


UN_RESET_CODEC; /* PULL 4215 OUT OF RESET x] 
DCB_LOW; 
/* Write out control word until dcb bit is low */ 
do 
{ 
out = temp; 


for (1=0;1<5; i++) 


{ 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.xsrempty == 1); 


n 


ERIAL_PORT_ADDR(SER_NUM) ->gcontrol = 0x0; 


/* See note on XRESET/RRESET and three cycle delay in C3x U.G. */ 
for (j=0; 4<374++); 


SERIAL_PORT_ADDR(SER_NUM) ->gcontrol = XCLKSRCE | XLEN_32 | XFSM | 
RFSM | RLEN_32 | XINT | RINT | 
FSXOUT | RRESET | XRESET; 


dummy = SERIAL _PORT_ADDR(SER_NUM) ->r_data; 


n 


ERIAL_PORT_ADDR(SER_NUM) ->x_data = out._intval[0]; 


/* See note on XRDY and three cycle delay in C3x U.G. */ 
for (j=0; 3<3;5++); 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.xrdy == 0); 


n 


ERIAL_PORT_ADDR(SER_NUM) ->x_data = out._intval[1]; 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.rrdy == 0); 


in._intval[0] = SERIAL PORT _ADDR(SER_NUM) ->r_data; 


/* See note on RRDY and three cycle delay in C3x U.G. */ 
for (j=0; j<3; j++); 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.rrdy == 0); 
in._intval[1] = SERIAL_PORT_ADDR (SER_NUM) ->r_data; 
} 
} while(in.control._bitval.dcb != 0); 


Analog Interface Peripherals and Applications 8-63 


CS4215 Interface to the TMS320C3x 


Example 8-16. CS4215.c (Continued) 


/* Write out control word twice with the dcb bit high */ 
temp.control._bitval.dcb = 1; 


out = temp; 
for (i=0;i1<2; i++) 


{ 


SERIAL _PORT_ADDR (SI 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.xsrempty == 1); 


ER_NUM)->gcontrol = 0x0; 


/* See note on XRESET/RRESET and three cycle delay in C3x U.G. */ 


for (j=0; 4<3; 4++); 


SERIAL_PORT_ADDR(SER_NUM) ->gcontrol = XCLKSRCE | XLEN_32 | XFSM | 


RFSM | RLEN_32 | XINT | RINT | 
FSXOUT | RRESET | XRESET; 


dummy = SERIAL _PORT_ADDR(SER_NUM) ->r_data; 


SERIAL _PORT_ADDR (SI 


ER_NUM) ->x_data = out._intval[0]; 


/* See note on XRDY and three cycle delay in C3x U.G. */ 


for (j=0; 4<3;5++); 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.xrdy == 0); 


SERIAL _PORT_ADDR (SI 


ER_NUM) ->x_data = out._intval[1]; 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.rrdy == 0); 


in._intval[0] = SERIAL _PORT_ADDR(SER_NUM) ->r_data; 


/* See note on RRDY and three cycle delay in C3x U.G. */ 


for (j=0; 3<3; 5++); 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.rrdy == 0); 


in._intval[1] = SERIAL PORT _ADDR(SER_NUM) ->r_data; 


} 


SERIAL _PORT_ADDR(SER_NUM) ->gcontrol = 0x0; 
SERIAL _PORT_ADDR(SER_NUM) ->gcontrol = XLEN_32 | RLI 


EN_32 | XFSM | RFSM 
RRESET | XRESET | XCLKSRCE; 
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Example 816. CS4215.c (Continued) 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.xrdy == 0); 
SERIAL _PORT_ADDR(SER_NUM) ->x_data = 0; 

/* See note on XRDY and three cycle delay in C3x U.G. */ 
for (5=0; 3<3;5++); 


while (SERIAL_PORT_ADDR(SER_NUM) ->gcontrol_bit.xrdy == 0); 


SERIAL _PORT_ADDR (SI 


te 
De) 
Z 


_NUM) ->x_data = data_control._intval[1]; 


dummy = SERIAL _PORT_ADDR(SER_NUM) ->r_data; 


SERIAL _PORT_ADDR (SER_NUM) ->gcontrol |= XINT RINT; 


SERIAL _PORT_ADDR(SER_NUM) ->gcontrol &= ~XCLKSRCE; 


SERIAL _PORT_ADDR(SER_NUM) ->s_rxt_control = 0; 


CL_INT_FL_REG; 


#if£ SER_NUM 

EN _SER_PORT_RCV_INT_1; 
#else 
EN _SER_PORT_RCV_INT_0; 
fendif 


EN_GLOBAL_INTS; 


DCB_HI; 
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8.7 Software UART Emulator for the TMS320C3x 


8.7.1. Hardware 


8.7.2. Software 
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By using the general-purpose I/O pins in conjunction with two timers and an 
external interrupt, you can develop a very flexible full-duplex universal asyn- 
chronous receive transmit (UART) emulator in software. This solution dis- 
cusses the implementation of an interrupt-driven, 9 600-baud UART with eight 
data bits, one stop bit, and no parity. This solution was contributed by Ted Fried 
of Advanced Computer Communications. 


The hardware interface is relatively straightforward ( see Figure 8-12). The re- 
ceive line is connected to both the INTO and IOF 1 pins. This triggers an inter- 
rupt on the falling edge of the start bit. The transmit line is connected to the 
IOFO pin and a pullup resistor. 


As shown in Example 8-17, the receive sequence begins when the start bit 
triggers the external interrupt. At the interrupt service routine, Ry INTO, timer0 
is loaded with a value that results in a delay of one half of the bit time. The rou- 
tine then loads the timer’s interrupt vector, enables it, then exits to the main 
program. When the timer triggers its interrupt, Ry- TMR-INT, the main body of 
the receive code executes. At this time, the line is in the middle of the start bit. 
The CPU then samples IOF1 and verifies that the start bit has been read in. 
If the start bit is verified, the timer is then loaded with the full-bit time and 
started. The procedure then exits to the main program. 


On successive timer0 interrupts, R,INTO, the received bits are shifted into a 
storage area in memory until a byte is read in. On the ninth interrupt, if the stop 
bit is verified, the routine executes a software trap to inform the main program 
of the byte reception. If the stop bit is not verified, the BAD_STOP_BIT subrou- 
tine is called where the appropriate action is taken. After the received byte is 
processed, the external interrupt is then reenabled and the system waits for 
the next start bit. 


The transmit routine begins when the main program loads a byte into the hold- 
ing register and then calls TX_MAIN. This procedure loads timer1 with the full- 
bit time value, resets the transmit counter, sets the start bit, and enables the 
timer’s interrupt. The routine then exits back to the main program. The main 
program does not call for another byte transmit until it finds the transmit count- 
er equal to 0. On each subsequent timer1 interrupt, T,-INT, the routine shifts 
out the transmit byte including the stop bit, until the transmit counter is 0. 


Software UART Emulator for the TMS320C3x 


Example 8-17. Full Duplex UART Emulator for TMS320C3x 


half_bit_time set O1ADh ; assume 33-MHz TMS320C3x 
whole_bit_time set 0358h 

timer_go set O3Clh 

timer_setup set O?D1h 

int_setup sec 0301h 

iof_setup set 06h 

timerO_vector -word RX_TMR_INT ; interrupt vector addresses 
timerl_vector .word TX_TNT 

rx_int_vector -word RX_INTO 

timerO_period -word 0808028h , on-chip RAM locations 
timerl_period -word 0808038h 

timerO_control -word 0808020h 

timerl_control -word 0808030h 

timerO_int_vect -word O809FC9h 

timerl_int_vect .word O0809FCAh 

intO_vector -word O809FCih 

rx_byte -word 0809FF8h 

tx_byte -word O0809FF9h 

rx_counter -word O809FFAh 

tx_counter -word OQ809FFBh 


; Main setup for asynchronous serial interface to be run at 


powerup. 
SETUP_ASYNCH: PUSH AR7 
OR iof_setup, IOF ; iof seetup and iof0=1 
LDI timer_setup, AR7 ; setup timerO and timerl 
STI AR7, @timerO_control "4 
STL AR7, @timerl_control : 
LDI rx_int_vector, AR7 ; load intO interrupt vector 
SIL ART, @intO_vector ; 
OR int_setup, IE ; enable interrupts 
POP AR7 
RETS 


; Start bit received. external interrupt service routine 


RX_INTO: PUSH AR7 
XOR 01h, Te ; Gisable intO 
LDI half_bit_time, AR7 H 
STL AR7, @timer0O_period ; ex_timer period 
LDI timerO_vector, AR7 7 
STI AR7, @timerO_int_vect ; ex_timer int vector 
LDI timer_go, AR7 : 
STI AR7, @timerO_control ; start rx_timer 
LDI OAh, AR7 ; 
SaeL AR7, @rx_counter ; reset rx_counter 
POP AR7 
RETI 
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Example 8-17. Full Duplex UART Emulator for TMS320C3x (Continued) 


; TimerO interrupt service routine for byte reception. 


RX_TMR_INT: 
LDI 
CMPI 
BNE 
CMP I 
BLT 
OR 


INTO 


OK: SUBI 


STOP: 


NEXT: 


ONE: RORC 
STI 
STI 
LDI 
STI 

CLEANUP: 

CLEANUP2: 


PUSH AR7 
@rx_counter, AR7 
09h, AR7 

STOP 
080h, IOF 
OK 
Olh, IE 


LEANUP2 

lh, AR7 

R7, @rx_counter 
hole_bit_time, AR7 
R7 @timerO_period 
imer_go, AR7 

R7, @timerO_crontrol 
R7 


PPtrrs Pon 


AR6 

@rx_byte, AR6 
AR7, NEXT 
080h, IOF 
BAD_STOP_BIT 
-24, AR6 

AR6, @rx_byte 
Olh, IE 
CLEANUP 

80h. IOF 

lh, ST 


R6, @rx_byte 

R7, @rx_counter 
imer_go, AR6 

R6, @timerO_control 
POP AR6 

POP AR7 RETI 


0 
0 
©) 
0 
AR6 
A 
A 
t 
A 


are we at start bit? 

nope, check for stop bit 
check rx_bit (IOF1) 

if less than 80h (IOF1=0)? 
bad start bit, reenable 


go back to main 
decrement rx_counter 
update counter in memory 


load bit time into rx_timer 


start rx_timer 


if rx_count !=0, get next bit 
check rx_bit (IOF1) 

GO TO INVALID STOP BIT MODULE 
shift rx_byte 24 bits right 
TRAP RECEIVED BYTE!! 

reenable INTO\ 


check rx_bit (IOF1) 
force carry flag to 1 
if rx_bit = 1 


set carry flag 
shift in carry 
update rx_byte 


to 0 
bit 
in memory 


update counter 


start rx_timer 


in memory 
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Example 817. Full Duplex UART Emulator for TMS320C3x (Continued) 


; Transmit byte main subroutine 
TX_MAIN: PUSH AR7 
LDI whole_bit_time, AR7 
STI AR7. @timerl_period ; load timer period 
LDI timerl_vector, AR7 ; 
STI AR7, @timerl_int_vect ; tx_timer int vector 
LDI @tx_byte, AR7 H 
OR OFFOOh, AR7 7 mask stop bit to tx_byte 
STI AR7, @tx_byte 7 update tx_byte 
AND OFBh, IOF ; send out ’0’ to IOFO 
LDI OAh, ART ; 
STI AR7, @tx_counter ; load counter in memory 
LDI timer_go, AR7 : 
STI AR7, @timerl_control ; start tx_timer 
POP AR7 
RETS 


; Timerl interrupt service routine for byte transmission. 


TX_INT: PUSH AR7 
LDI @tx_counter, AR7 ; load in tx_counter from mem 
DBNZ AR7, NEXT_OUT ; if tx_counter not zero 
POP AR7 
RETI 
NEXT OUT: PUSH AR6 
LDI timer_go, AR7 
STI AR7, @timerl_control ; start tx_timer 
LDI tx_byte, AR6 ; load in tx_byte from mem 
RORC AR6 ; next bit out is in carry 
BNC OUT ZERO 7 carry=0. then send out ’0’ 
OR 04h, IOF ; send out ’1’ to IOFO 
BR CLEANUP3 , 
OUT ZERO: AND OFBh, IOF ; send out ’0’ to IOFO 
CLEANUP3: STI AR6, @tx_byte ; update byte in memory 
STI AR7, @tx_counter 7 update counter in memory 
POP AR6 
POP AR7 
RETI 
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8.8 Hardware UART for TMS320C3x 


Section 8.7 discusses a software UART emulator, which allows the ’C3x to per- 
form asynchronous communication. There are some applications that require 
a hardware UART. This section describes one possible design for a hardware 
UART (see Figure 8-12). This design, originally done in a field programmable 
gate array (FPGA), can be easily transferred to an application specific inte- 
grated circuit (ASIC). You can modify this design to accommodate faster data 
rates or different communication protocols. 


Figure 8—12. TMS320C3x Serial Port to UART Interface 
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FSRO logic 
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A 
H3 


TX 


RX 


Hardware UART for TMS320C3x 


Figure 8-13 shows a 9,600-baud UART with one stop bit and one start bit. The 
clock signal, H3, is supplied to the circuit from the ‘C3x. The DSP uses a 
25-MHz clock. 


Figure 8—13. Transmit Circuitry 


CLKXO 


H3 


DXD 


H3 


XEN 
FSXD 


XEN 
Stop bit 


sa —_. 7, * xen 
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—D Q 
—pP H3 — i 
XEN — CE ca 
) FSXR 
R 
Modulus 8 binary counter 
H3—t> 
Qi f-— 
—_ CE Q2 — > D Q}—— Stop bit 
Q3,—— 
FSXR— R H3 
CE 
FSXR R 


The ’C3x serial port transmit circuitry, shown in Figure 8-13, is configured to 
output eight bits of data at a rate of approximately 9.6 kHz. This is achieved 
by using one of the ’C30’s internal timers and programming it to the desired 
9.6 kHz frequency. The transmitting port is configured in the first burst mode. 
This allows the leading FSX signals to help initiate a start bit for the UART 
protocols. The stop bit is generated at the end of the eighth bit by the UART 
circuitry. 


The receive circuitry of the UART, shown in Figure 8—14, is activated when the 
circuit detects the start bit. The start bit is a logical 0. The delay circuit is acti- 
vated on the falling edge of the start bit. The delay causes sampling of the 
incoming data bits to occur in the middle of each bit, thus, increasing the 
UART’s noise immunity. 
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Figure 8—14. Receive Circuitry 
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After the delay is performed, the timer is activated. The timer has a period of 
104 us, which corresponds to a baud rate of approximately 9.6 KHz. At each 
bit time, a data value is sampled into an 8-bit shift register. After all eight bits 
are received, the data is passed to the ’C30 over the serial port at 1/8 of the 
H3 clock rate. The FPGA circuitry interfaces the ’C30 in the fixed burst mode 
of operation to the serial port. Both the clock and the frame sync signals are 


generated by the FPGA circuitry. 


This UART circuitry can also easily be designed to function as an ASIC or can 
be incorporated into a custom digital signal processor (CDSP). Modification to 
this circuit can be done for different serial communication protocols or even 


higher baud rates. 


Chapter 9 


Clock Oscillator and Ceramic Resonators 


This chapter provides a general background on oscillators as well as informa- 
tion regarding crystal and ceramic resonators, their frequency characteristics, 
and the type of oscillator circuit used on the ’C3x. Also covered are design as- 
pects of the ’C3x oscillator, including appropriate configuration of the external 
components, measured parameters for the on-board portion of the circuitry, 
use of the oscillator with overtone crystals, and general design considerations 
for choosing the external components for the oscillator. Finally, this chapter 
shows some design solutions for common frequencies. 
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9.1 Oscillators 


The 'C3x is a member of the Texas Instruments’ family of high-speed DSPs. 
The ’C3x is capable of performing operations at a rate of up to 30 million 
instructions per second (MIPS). The wide variety of DSP applications requires 
a wide range of clocking frequencies. The ‘C3x allows considerable flexibility 
in meeting these clocking requirements. 


The ’C3x provides two modes for clock generation and control for use with dif- 
ferent application needs. These include: 


(1 External clock input with the capability to divide the clock frequency by 2 


.) Internal clock generation from an on-board oscillator with no external clock 
necessary (’C30 and ’C31 only) 


The built-in oscillator provides a method for accurate clock generation that re- 
quires few external components (a crystal or ceramic resonator and two load 
capacitors). This saves board space and reduces system cost. 


On the ’C3x devices, the on-board oscillator operates in a divide-by-2 mode. 
In this mode, the frequency of H1 or H3 (which indicates the actual machine 
cycles of the processor) is one half of the oscillator frequency. 


9.1.1 Recommendations for Oscillator Use 
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The ’C3x family of devices provides several clock generation options based 
on cost, component count, and the required clock frequency for the applica- 
tion. The oscillator clocking option on the ’C3x provides a low-cost method of 
clock generation with as few as three external components (one crystal and 
two load capacitors), which helps to minimize board space consumed for clock 
generation. The crystal or ceramic resonator used determines the frequency 
of operation. This frequency can extend up to 60 MHz with third-overtone crys- 
tals. 


CMOS-compatible integrated-circuit crystal oscillators are available across a 
wide frequency range. These are more expensive than the internal oscillator 
and usually consume more space on the board. CMOS oscillators also be- 
come more expensive with higher operating frequency. 
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9.2 Quartz Crystal and Ceramic Resonators 


All oscillators require resonating components to determine the frequency of 
oscillation. A resonating component reacts more strongly within a certain fre- 
quency range than at other frequencies outside that range. A simple resonator 
consists of an inductor (L) and a capacitor (C). These components resonate 
or favor the frequency at which their individual reactances cancel each other. 
Figure 9-1 shows a simple series-LC resonator with impedance equations. 


Figure 9-1. Series-LC Schematic 


Lx Cy 


The impedance equations for the series-LC schematic are as follows: 
Z_ = jal Zo = 1/jaC Z =Z + Ze = j(wL — 1/wC) 
Z, is minimum where wl = 1/wC 


1 1 
> Ws 
LC VLC 


Consider the impedance of the series combination of these components. The 
impedance of the inductor Z, =jwL, where w is the angular frequency (w = 2zf), 
and the impedance of the capacitor Z, = 1/jwC. The total impedance of the 
inductor-capacitor combination is Z; = Z, + Z, = j(@L — 1/wC). Therefore, the 
magnitude of the combined impedance of these two components is a minimum 
at the frequency where wL = 1/wC. This frequency (w,) is the resonant fre- 
quency and is determined by : 


SO W,2 


Although oscillators frequently consist of different combinations of inductors 
and capacitors as resonating elements, the accuracy of the frequency control 
with these components is limited. Changes in the values of L and C due to tol- 
erance limitations and changes in the environment (such as temperature) 
strongly affect the frequency of the oscillator. Many applications in digital sys- 
tems require precise clock timing and need more accurate resonators. Quartz 
crystal and ceramic resonators can provide a more stable and precise fre- 
quency control. 
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9.2.1 


Behavior and Operation of Quartz Crystal and Ceramic Resonators 


The oscillator circuitry built into the ’C3x devices is designed for use with a 
quartz crystal or ceramic resonator as the frequency-controlling element. 


Quartz crystal and ceramic resonators are resonating components made with 
materials that have specific piezoelectric properties. Piezoelectric materials 
deform mechanically in the presence of an electric potential; this mechanical 
stress on the material produces a voltage. This property makes a very stable 
resonator, since the frequency of mechanical vibration is controlled precisely 
by the size, shape, and material properties of the crystal or ceramic used. In 
fact, many quartz crystal resonators are so precise that they operate within 
10 parts per million (ppm) of the intended frequency. 


Ceramic resonators are similar to quartz crystal resonators in physical struc- 
ture, but they are made from a polycrystalline ceramic instead of monocrystal- 
line quartz. The production process for the ceramic is much less expensive 
than for quartz, reducing the final cost of the resonator. However, the polycrys- 
talline structure of the ceramic vibrates within a wider range of frequency than 
a quartz crystal does, and consequently, the frequency control is not as precise 
as it is with quartz. While quartz crystal resonators can operate within 10 ppm 
of the intended frequency, ceramic resonators generally operate within 
5000 ppm. However, if accuracy greater than 5000 ppm is not necessary, ce- 
ramic resonators are a cost-effective alternative. Table 9-1 shows a compari- 
son of three types of resonators. 


Table 9-1. Comparison of Resonator Types 


Frequency Long-Term 


Type Relative Price Adjustment Tolerance Stability 
LC Very low Necessary + 20000 ppm__—‘ Fair 

Ceramic Low Not necessary +5000 ppm Excellent 
Crystal High Not necessary +10ppm Excellent 


This document assumes that a quartz crystal is being used as the resonator; 
however, the information applies equally to ceramic resonators, unless other- 
wise specified. 


Figure 9—2 shows a circuit model that is equivalent to a crystal. The graphs il- 
lustrate the behavior of the magnitude of the crystal impedance and the reac- 
tance of the crystal with frequency. The three components, Ly, Rx, and Cy, 
model the electrical behavior related to the mechanical vibration of the crystal. 
Ly and Cy control the resonant frequency according to the same equation 
shown in Figure 9-1. Ry models the mechanical energy loss in the crystal and 


Quartz Crystal and Ceramic Resonators 


is related to the power dissipation in the crystal. Co is the capacitance of the 
two electrodes. The dielectric of the quartz physically separates the two elec- 
trodes. Together these components are a reasonably accurate electrical mod- 
el for the behavior of the crystal. Values for these component models are usu- 
ally available from the crystal manufacturer. 


Figure 9-2. Crystal Equivalent Circuit Model 


Lx Rx Cy 


Co 


Notes: 1) Cg is the capacitance of the two electrodes. 


2) Ly, Rx, and Cy model the electrical behavior related to the mechanical vibration of 
the crystal; Ly and Cy control the resonant frequency according to the same equation 
shown in Figure 9-1 and Rx models the mechanical energy loss in the crystal. 


Like the series LC resonator, crystals have an impedance minimum at a fre- 
quency determined by L, and Cy. This is the series-resonant frequency (f.). 
The presence of Co also introduces an impedance maximum at a frequency 
determined by Ly and Co. This frequency is the parallel-resonant frequen- 
cy (fp). A graph of impedance magnitude that illustrates this behavior is also 
shown in Figure 9—3. The series-resonant frequency corresponds to the natu- 
ral mechanical vibration frequency of the crystal. The parallel-resonant fre- 
quency is basically an electrical measurement phenomenon that results from 
the resonance between Ly and Coin the electrical model of the crystal and does 
not occur naturally. Consequently, all crystal oscillators operate at or near their 
series-resonant frequency. 


Figure 9-3. Impedance Characteristics of Crystal 
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Notes: 1) fs = series-resonant frequency 
2) fp = parallel-resonant frequency 
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The graph in Figure 9-3 illustrates the behavior of the magnitude of the imped- 
ance of the crystal, but the crystal’s phase response is also importantin oscilla- 
tor design. Figure 9-4 shows the reactance of the crystal with frequency. The 
reactance (and consequently the phase) is 0 at the series-resonant frequency 
(fs), because at this frequency the reactances of L, and Cy cancel each other. 
At this frequency, the total impedance of the crystal is equal to the resistance 
Ry. 


Figure 9-4. Reactance Characteristics of Crystal 
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Notes: 1) fs = series-resonant frequency 
2) fp = parallel-resonant frequency 


Below f,, the crystal appears capacitive (negative reactance). Between f, and 
fp, the crystal appears inductive (positive reactance) and above f, the crystal 
appears capacitive again. In an oscillator circuit, the crystal is always operated 
at or slightly above the series-resonant frequency in the inductive region. The 
capacitance Cg has little effect on the series-resonant point (f,), but in combina- 
tion with the external load on the crystal, the capacitance Cy affects the paral- 
lel-resonant point (fp). For simplification of the circuit analysis, Co is sometimes 
considered part of the external load on the crystal. 


When ordering a crystal, you must tell the manufacturer whether a 
series-resonant or parallel-resonant crystal is required. The nature of these 
terms is slightly different from the serial- and parallel-resonant frequency 
terms (fs and fp) previously described. A series-resonant crystal is intended to 
operate in a circuit with a low-load impedance across its terminals and, 
consequently, resonates very close to the series-resonant frequency (f,). A 
parallel-resonant crystal is intended to operate in a circuit with a 
high-impedance load across its terminals and operates at some frequency 
slightly above f, where the crystal’s reactance is inductive. In this case, the 
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crystal attempts to resonate at the frequency at which its own inductive 
reactance exactly cancels the capacitive reactance of the combination of Co 
and an external-capacitive load. If supplied with the desired frequency and the 
external load to which the crystal will be connected, the manufacturer can 
produce a crystal that meets both of these requirements. The oscillator circuit 
used on the ’'C3x devices requires a parallel-resonant crystal. 


9.2.2 Crystal Response to Square-Wave Drive 


Figure 9—5(a) shows the equivalent circuit model of a crystal driven by a step- 
function voltage source in series with a resistive load. In this figure, the capaci- 
tance, or Co, of the crystal model is ignored because it is usually considered 
part of the load on the crystal and does not strongly affect the series-resonant 
frequency. When a step function excites a crystal, the crystal produces 
damped sinusoidal oscillation at its series-resonant frequency, as shown in 
Figure 9—5(b). The magnitude of the damping on the output waveform is pro- 
portional to the magnitude of Ry. 


The lowest natural frequency of the crystal is the fundamental frequency. De- 
pending on the design of the crystal, itcan also have contributions to its output 
waveform from odd multiples of the fundamental frequency, or overtones. 
However, if the response at the fundamental frequency is considerably stron- 
ger than the response at these overtone frequencies, the contribution of the 
overtones to the output waveform is negligible. 


If the step-function input is changed to a square-wave drive (a periodic set of 
step functions) at the frequency of the fundamental, the output of the crystal 
is sinusoidal, as shown in Figure 9—5(c). The source of the square wave pro- 
vides enough energy to overcome the damping in each cycle. Although a 
square wave has a high content of odd overtones, the crystal resonates at its 
fundamental frequency and strongly attenuates all other frequencies. Conse- 
quently, the output of a crystal driven by a square wave is sinusoidal. If this 
sinusoidal output is fed back to the input of an appropriately designed amplifier, 
as shown in Figure 9—5(d), sustained oscillation is generated. 
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Figure 9-5. Crystal Response to a Square-Wave Drive 
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Notes: 1) Cg is the capacitance of the two electrodes. 


2) Lyx, Rx, and Cy model the electrical behavior related to the mechanical vibration of 
the crystal; Ly and Cy control the resonant frequency according to the same equation 
shown in Figure 9-1 and Rx models the mechanical energy loss in the crystal. 
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9.3 Pierce Oscillator Circuit 


Figure 9-6 shows an oscillator circuit in its simplest form: an amplifier and a 
feedback network. This circuit must meet two requirements to sustain oscilla- 
tion: 


Lj The circuit must have positive feedback. 
1 The open loop gain must be greater than 1. 


In Figure 9-6, A is the gain of the amplifier and B is the gain of the feedback 
network. For the circuit to have open-loop gain greater than 1, A x B must be 
greater than 1. For the circuit to have positive feedback, the phase shift around 
the loop must be 0 degrees (or n360°, where n = 0, 1, 2, 3, ...). If these condi- 
tions are met, the output oscillates at a frequency determined by the frequency 
selective feedback network and the amplitude increases until it reaches the 
linearity limitation of the amplifier. 


Figure 9-6. Simple Form of an Oscillator Circuit 
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There are many possible combinations of amplifiers, crystals, and phase- 
shifting components (inductors and capacitors) that meet the above-specified 
conditions for oscillation. One of the most common is a circuit based on the 
Pierce oscillator. Figure 9-7 shows an ideal version of this circuit. The Pierce 
oscillator uses an inverting amplifier, a parallel-resonant crystal as a resonator, 
and two capacitors as phase-shifting elements and load for the crystal. This 
circuit is used for several reasons: 


Lj It has a large frequency range, from approximately 1 KHz to 200 MHz. 


Lj Ithas high Q (because the load impedances are mostly capacitive and not 
resistive) and consequently exhibits very good stability. 


Lj It maintains a high output signal while driving the crystal at a low-power 
level. This is important at higher frequencies, where crystals are physical- 
ly thinner and therefore have lower power-dissipation limits. 


(1 The low-pass RC networks formed by the crystal and load capacitors tend 
to filter transient noise spikes, giving the circuit good noise immunity. 


Figure 9-7. Pierce Circuit: Ideal Operation 
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9.3.1 Oscillator Operation 


The ideal circuit operates in the following manner. An input signal to the amplifi- 
er appears at the output, phase-shifted by approximately 180°. If itis assumed 
that at a certain frequency the impedance of C, is much greater than Rj, then 
the phase shift of this RC network introduces another approximately 90° phase 
shift. At the series-resonant frequency, the crystal appears to be a resistor and 
forms another RC network with Co. If the impedance of Cz is much greater than 
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the series resistance (Rx) of the crystal, this network provides another 
90° phase shift. The total phase shift around the loop is now 
180° + 90° + 90° = 360°. This phase shift meets one of the conditions for os- 
cillation. If the gain of the amplifier is high enough to overcome the losses in 
the R;-C,—crystal(R,) — C2 network fora total loop gain of greater than 1, then 
the circuit meets both oscillation conditions and oscillates. 


This explanation, however, is unrealistic because it ignores too many aspects 
of real-world circuit effects. Figure 9-8 illustrates a more typical example of the 
circuit behavior. In this case, the inverting amplifier has some phase delay, 
which causes it to produce a phase shift somewhat longer than 180°, depend- 
ing on the frequency of operation. If oscillation is to occur, the passive compo- 
nents are forced to compensate for this phase difference. The only way the im- 
pedance of the load capacitances can change is when the frequency of opera- 
tion changes. The frequency of operation tends to move above the series-res- 
onant frequency, lowering the impedance of the load capacitances and raising 
the impedance of the crystal as it goes from being purely resistive to being both 
resistive and inductive (see Figure 9—2 (c) on page 9-5). When the frequency 
changes such that the loop phase shift once again equals 360°, the circuit os- 
cillates at the higher frequency. For this reason, most Pierce circuits operate 
5 — 40 ppm above the series-resonant frequency. This explanation clearly il- 
lustrates the circuit’s actual behavior and explains why a parallel-resonant 
crystal always operates slightly above the series-resonant frequency. 


Figure 9-8. Pierce Circuit: Actual Operation 
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When a square-wave output is desired (such as for a microprocessor clock 
source) the Pierce circuit sometimes is implemented in the manner shown in 
Figure 9-9. The crystal and load capacitances are in the same configuration 
as the circuit shown in Figure 9-8, with the exception that R; is replaced with 
the output impedence of the inverter. In the linear region, the inverter behaves 
like a linear inverting amplifier. The resistor (R,) is introduced across the invert- 
er to bias it into the linear region. This is the transition region between the two 
digital states, as shown in Figure 9-11 on page 9-14. Otherwise, the inverter 
output moves toward one of its two stable digital states and oscillation does 
not start because there is no gain in these regions (the output characteristic 
shown in Figure 9—11 on page 9-14 is flat). 


Figure 9-9. Pierce Circuit for Square-Wave Output 
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The removal of R; from the circuit improves the loop gain and thus improves 
the likelihood of oscillation. However, removing R, also increases the drive lev- 
el (power dissipation) on the crystal. The power dissipation limit of the crystal 
must not be exceeded under these conditions (power dissipation issues are 
discussed in section 9.4.4 on page 9-18.) Otherwise, the circuit operation is 
identical to that described for Figure 9-8. 


The second inverter is added as a buffer and a waveshaping device. Since the 
output of the crystal is sinusoidal, the output of the first inverter also is sinusoi- 
dal. The second inverter provides a rail-to-rail square-wave output at the 
oscillation frequency to drive the microprocessor clock. 
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9.3.2 Pierce Oscillator Configuration for the TMS320C30 and TMS320C31 
The ’C3x DSPs have two options for clocking the processor: 


Lj Divide-by-2 operation of an externally supplied clock 
_j Divide-by-2 operation using the internal oscillator 


To use the ’C3x internal oscillator, connect the crystal across the X2/CLKIN 
and X1 pins of the C30 and ’C31 (the C32 does not support the internal oscil- 
lator option.) 


The ’C8x oscillator circuitry (with the exception of the crystal and the load ca- 
pacitors) is integrated into the processor. Figure 9-10 shows the ’C3x oscilla- 
tor circuitry, which is similar to the Pierce integrated circuit oscillator shown in 
Figure 9-9. On the ’C3x, the waveshaping inverter (I2) takes its input from the 
input side of the inverter being used as the amplifier (I,) rather than from the 
output as in the Pierce oscillator. This has little effect on the oscillator other 
than generating the digital complement of the clock that is generated in the cir- 
cuit of Figure 9-9. Also, the feedback resistor in Figure 9—9 is integrated into 
the ’C3x as an active-load transistor-feedback network, so an external-feed- 
back resistor is unnecessary. This feedback network ensures that the inverter 
14 is biased in its linear region. 


Figure 9-10. TMS320C3x Oscillator Circuitry 
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The inverters in the oscillator circuitry differ from the usual CMOS inverter con- 
figuration (Shown in Figure 9-11) in that the p-channel transistor is biased as 
an active load instead of having the gate connected as the input of the inverter. 
This difference is part of the biasing scheme, which helps to ensure that the 
oscillator starts when power is applied. This design causes the rise and fall 
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times to be asymmetrical (for example, the rise time is longer than fall time), 
but since the oscillator output is divided by 2 before driving the internal-proces- 
sor circuitry, the duty cycle of the final clock (H1 or H3) is 50%. 


Figure 9-11. Digital Inverter Circuit and Its Transfer Characteristic 
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9.3.3. Overtone Operation of the Oscillator 


Although crystals are usually considered to vibrate at only one frequency, they 
also resonate at odd multiples, or overtones, of the series-resonant frequency. 
The series-resonant frequency is the fundamental frequency of the crystal, 
and the odd overtones are odd multiples of the fundamental frequency (for ex- 
ample: 3x, 5x, 7x, ...). For low frequencies, it is common to operate crystals at 
their fundamental frequency. For higher frequencies, the crystal is made thin- 
ner. The thinner the crystal is, the more fragile and expensive it becomes. Thin- 
ner crystals also have a low-power dissipation limit and damage easily when 
overdriven. 


Most fundamental mode crystals operate at frequencies of 40 MHz or less. To 
generate frequencies higher than 40 MHz, it is common to use overtone crys- 
tals. Overtone crystals are optimized for operation at an overtone frequency 
with the fundamental frequency attenuated. Figure 9-12 illustrates the imped- 
ance of a crystal with respect to frequency. The strongest change in imped- 
ance is at the fundamental frequency, but there is also a response at the third 
and fifth overtones. If a crystal with the properties in Figure 9-12 is used ina 
Pierce circuit, it oscillates at the fundamental frequency. However, if the funda- 
mental frequency is attenuated, the crystal circuit oscillates at the next higher 
odd overtone, in this case, the third overtone. High-frequency operation is 
achieved by using an overtone crystal and attenuating the fundamental fre- 
quency. 


Pierce Oscillator Circuit 


Figure 9-12. Impedance Characteristics of a Crystal 
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For the Pierce circuit used on the ’C3x, this attenuation of the fundamental fre- 
quency is achieved by capacitively coupling an inductor (Lj) in parallel with the 
load capacitor (C;), as shown in Figure 9-13. The value of L, is chosen to reso- 
nate with C; at some intermediate frequency between the frequency of the de- 
sired overtone and the next lower odd overtone. At the desired overtone fre- 
quency, the impedance of L; is high enough compared to C, thatL, is neglected 
and the network of C; and the inverter’s output impedance provides the 
near-90° phase lag desired. Since the phase conditions are met, the circuit 
oscillates at this frequency. At all lower overtones, L; is a lower impedance 
than C,; and causes a 90° phase lead instead of phase lag. At any of these low- 
er frequencies, the total phase shift around the feedback loop is 180°, not 360°, 
which is negative feedback, and stabilizes the circuit and prevents oscillation. 
L, is coupled with a 0.1 uF capacitor, which prevents the inductor from altering 
the dc bias of the inverter while causing negligible additional impedance at the 
oscillation frequency. 


Figure 9-13. Oscillator Circuit for Overtone Crystal Operation 


'C3x \ 
see =e eee eeseee Oise S405 SS AO sess Sy ae ape Sal 
Xo/CLKIN x4 
eqo_——-——_ 
SO uF 
Co > > Cy 


As an example, assume a 60-MHz third-overtone crystal is used with 10 pF 
load capacitors. The fundamental for this crystal is at 60/3 = 20 MHz. L; must 
be chosen to resonate with C, at a frequency between 20 and 60 MHz. If you 
choose the frequency halfway in between, 40 MHz, the value of L, is calculated 
as follows: 


Ly = 1/(@2C}) = 1/(4m2f2C,) = 1/(402 (40 x 106)2 (10 x 10°12)) = 1.58 WH 


Since the value of this inductance is not critical, the closest conveniently avail- 
able inductor is used as long as the resonant frequency of L;—C; falls between 
the desired overtone and the next lower overtone. 


A variety of crystals have been evaluated in this circuit. Although at higher fre- 
quencies, fifth-overtone crystals are more commonly available, they are not 
recommended for this circuit. The available gain from the internal inverting am- 
plifier limits this configuration to third-overtone crystals. Several third-overtone 
crystal solutions for this circuit up to 60 MHz are listed in Table 9-2 on page 
9-22. 
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9.4 Design Considerations 


This section discusses some of the aspects of the design of the oscillator and 
their effects on its operation. 


9.4.1 Crystal Series Resistance (Ry) 


The series resistance of the crystal has a strong effect on the design of the os- 
cillator, primarily in loop gain. R, limits the crystal’s minimum impedance value 
(seen at series resonance). Since the impedances of L, and C, cancel each 
other at this frequency, the impedance of the crystal is due entirely to Ry. The 
voltage divider formed by the crystal and Cz influences the loop gain. As the 
impedance of the crystal becomes larger, the loss of gain due to the voltage 
divider becomes greater. Low-loop gain causes the oscillator to take longer to 
start up and prevents oscillation if the overall loop gain falls below 1. Higher 
crystal series resistance also reduces the overall oscillator circuit Q, resulting 
in poorer frequency stability. For these reasons, it is desirable to use the lowest 
R, possible. Crystals with series resistance of 40 ohms or less are recom- 
mended. 


9.4.2 Load Capacitors 


In the Pierce circuit used on the ’C3x, the load capacitors have a strong effect 
on how far above the series-resonant frequency the crystal oscillates. The 
crystal’s shunt-terminal capacitance, Co, is considered part of the crystal’s 
external-load capacitance as far as the frequency controlling elements (C, and 
Ly) are concerned. A parallel-resonance oscillator circuit operates at the 
frequency where the reactances of the crystal (C, and L,) cancel the 
reactances from the load (Co, C1, Cs). Consequently, changes in the 
external-load capacitance cause the oscillator to change frequency to 
compensate for the phase change. The following formula gives an 
approximate value for the frequency shift from the series-resonant frequency: 


1Co where r = £2 and C, = C,+C, 


Al ~ 57G,4+ 0) G 


The derivative of this formula, as shown below, is useful for determining the 
frequency variance due to changes in the load capacitance. This derivative is 
applied to find the frequency range implied by a load capacitance with a given 
tolerance. Also, if there is a need to adjust the operating frequency, use this 
formula to determine the appropriate value of a variable load capacitor. 


ACEC: 


Ah = 2iCS + CF 
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9.4.3 Loop Gain 


Crystal manufacturers often accommodate requests for specific values for 
load capacitance to be used with their crystals. Values of 20 pF and 30 pF are 
commonly available. These load capacitance values are represented by Cj + 
Co, so for a crystal designed for load capacitance of 20 pF, C; = C2 = 10 pF is 
used. Capacitance values higher than 30 pF increase attenuation, lowering 
the overall loop gain. Capacitance values this high can cause the circuit to stop 
oscillating. A load capacitance of 20-30 pF is recommended for high-frequen- 
cy crystals. Ceramic resonators usually require higher load capacitance than 
high-frequency crystals (See the manufacturer’s recommendations). Load ca- 
pacitance values are included in Table 9-2 on 9-22. 


Loop gain primarily affects the startup time of the oscillator. Overall loop gain 
must be greater than 1 for oscillation to be sustained. Higher loop gain causes 
the oscillation amplitude to increase rapidly, therefore reducing the time nec- 
essary for the oscillator to reach its steady state. 


The minimum gain measured for the ’C3x inverter is 5.6. To maintain an overall 
loop gain of 1, the external component network of C1-crystal-C2 must not 
introduce a loss of greater than 5.6. For this reason, the values of the load ca- 
pacitance and crystal-series resistance have a strong effect on whether the cir- 
cuit oscillates. 


9.4.4 Drive Level/Power Dissipation 


Another parameter specified when ordering a crystal is the drive level or power 
dissipation. Higher frequency crystals generally have lower power dissipation 
ratings because the crystal is physically thinner and is damaged by excessive 
voltages. Power dissipation also affects frequency stability because the crys- 
tal’s frequency of operation is dependent on temperature. Excessive power 
dissipation causes crystal heating and results in frequency drift. 


There is not a convenient way to measure the power dissipation in the crystal. 
The series resistance (Rx) is the only power-dissipating componentin the crys- 
tal. Measuring the external voltage on the crystal includes the voltage across 
Ly and Cy. Therefore, the power dissipation in Ry cannot be easily calculated 
directly from the voltage on the crystal. It is necessary to measure the current 
through the crystal using a current probe or to indirectly measure the current 
by measuring the voltage across a small resistor in series with the crystal. You 
can then calculate the power by using IR. 
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Once the drive level is known, if itis necessary to limit the drive level to the crys- 
tal, one of the simplest ways to do so is shown in Figure 9-14. A resistor (Rg) 
is added in series between Xj and the external components. This resistor 
drops part of the voltage driven by the ’C3x and consequently lowers the drive 
voltage on the crystal. The disadvantage to this method is that the voltage drop 
reduces the overall loop gain of the oscillator circuit. The value of Rg must be 
large enough to bring the power dissipation of the crystal within the manufac- 
turer’s specification, but Rg must not be so large that the loop gain drops below 
1 or the circuit no longer oscillates. Using crystals with minimum power dis- 
sipation ratings of 1 mW is recommended. 


The oscillator circuit solutions in Table 9-2, when operated without Rg, have 
yielded crystal-power dissipation measurements near 1 mW. Differences in 
circuit and crystal parameters can cause the power dissipation in the crystal 
to slightly exceed 1 mW. If crystal-power dissipation is critical, adding a resistor 
(Rg) with a value of 33 Q to limit the crystal-power dissipation or obtaining crys- 
tals with power dissipation ratings higher than 1 mW, is recommended. When 
operated with Rg = 33 Q, each of the circuit solutions shown in Table 9-2 have 
exhibited less than 1 mW crystal power dissipation. 


Figure 9-14. Addition of Rg to Limit Drive Level of the Crystal 
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9.4.5 Startup Time 


Figure 9-15 shows that when the oscillator starts, low-amplitude oscillations 
gradually build until the linearity limit of the amplifier is reached. You experi- 
ence this startup time at power-up. Maximizing loop gain minimizes the startup 
time for the oscillator. 


Startup time depends on the external components used, but generally 
requires at least 100 ms after power up for the oscillator to stabilize. For this 
reason, a reset delay of 150-200 ms is recommended following power up. 


Figure 9-15. Oscillator Startup 
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9.4.6 Frequency-Temperature Characteristics of Crystals 
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The actual operating frequency of a crystal depends on temperature. The ex- 
tent to which frequency changes with respect to temperature strongly relates 
to the cut of the crystal. AT- and SC-cut crystals behave differently from DT-, 
CT-, and BT-cut crystals. Even slight changes in the cut angle of the crystal can 
strongly affect the frequency-temperature characteristics. 


Most crystals available in the frequency range of interest for DSPs are AT-cut 
crystals. The frequency-temperature characteristic for AT-cut crystals is a 
third-order function, similar to that shown in Figure 9—16. This graph shows the 
general temperature-frequency behavior of AT-cut crystals. Similar informa- 
tion is readily available from crystal manufacturers. 
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Figure 9-16. Example Frequency-Temperature Characteristic of AT-Cut Crystals 
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Crystal aging is the gradual change in the frequency of acrystal over time. This 
change occurs due to stress relief between the mounting structure and the 
electrodes and absorption (or deabsorption) of contaminants from the resona- 
tor surfaces. Changes in temperature accelerate both of these mechanisms. 
The major mechanism for aging in crystals above 1 MHz is mass transfer to 
and from the resonator surfaces. The most rapid aging occurs early in the crys- 
tal’s lifetime, and then aging tends to stabilize. For example, acrystal that ages 
10-60 parts per million (ppm) in a year experiences 5 ppm of that aging in the 
first month. Crystals are available (at additional expense) that have very low 
aging rates, due to cleaner fabrication and packaging processes. These crys- 
tals have aging characteristics as low as 1 x 10°8 ppm per year. Complete in- 
formation on aging characteristics is available from crystal manufacturers. 
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9.5 Oscillator Solutions for Common Frequencies 


The oscillator solutions in this section were built and tested with samples from 
the manufacturers listed in Table 9-2. These circuits were tested at room tem- 


perature and verified to operate correctly within the recommended range of Vpp 
(4.75—-5.25 V). 


Table 9-2. Oscillator Solutions by Frequency 


Frequency Mode Type Supplier Part Number Ci, Co Rg L, 
40 MHz Fundamental Crystal SaRonix HFX series crystals 10pF 0/33t - 
40 MHz Third overtone Crystal Anderson 011-668-04663 10pF 0/383t 3.3uH 
50 MHz Fundamental Crystal SaRonix HFX series crystals 10pF 0/33T - 
50 MHz Third overtone Crystal SaRonix SRX5223 10pF 0/383t 3.3uH 
60 MHz Third overtone Crystal Anderson 011-668-04725 10pF 0/33t 3.3uH 


t When these circuits are operated without Rg, they yield crystal power dissipation measurements near 1 mW. Differences in circuit 
and crystal parameters can cause the power dissipation in the crystal to slightly exceed 1 mW. If crystal power dissipation is criti- 
cal, it is recommended that 33 Q of Rg be added to limit the crystal power dissipation or obtain crystals with power dissipation 


ratings higher than 1 mW. When operated with Rg = 33 Q, each of the circuits shown exhibited less than 1 mW crystal power dis- 
sipation. 


The following circuits are used for ceramic resonators and fundamental-mode 
crystal resonators. The circuit in Figure 9—1 7 is used for all circuits marked fun- 
damental mode in Table 9-2. The circuit in Figure 9-18 is used for all circuits 
marked third-overtone mode in Table 9—2. Crystals used in these circuits must 
be parallel resonant with a series resistance of 40 ohms or less and must have 
a power dissipation rating of 1 mW or greater. 


Figure 9-17. Fundamental-Mode Circuit 
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Figure 9-18. Third-Overtone Circuit 


Xo/CLKIN 


\ 
/| 
2 


| 
\| 
/| 
wee, 
Lee 


Clock Oscillator and Ceramic Resonators 9-23 


9-24 


Chapter 10 


XDS$510 Emulator Design Considerations 


This chapter explains the design requirements of the XDS510'™ emulator and 
discusses the Extended Development System (XDS) cable (manufacturing 
part number 2617698—0001). This cable is identified by a label on the cable 
pod marked JTAG3/5V and supports both standard 3-V and 5-V target system 
power inputs. 


The term JTAG emulation, as used in this book, refers to Tl scan-based emula- 
tion, which is based on the IEEE 1149.1 standard. 


Topic Page 
10.1 Designing the MPSD Emulator Connector (12-Pin Header) ....... 10-2 
10:29 EmulatonGableiPodilogiceeecn eee eerie tere aerni rere 10-3 
10.3 MPSD Emulator Cable Signal Timing ...............00eeeeee eee 10-4 
10.4 Connections Between the Emulator and the Target System ...... 10-5 
10.5 Mechanical Dimensions for the 12-Pin Emulator Connector ..... 10-8 
10:6 Diagnostic “Applicatlons soci ae ceiene c leaie es lan se cleis)aie ace 10-10 


10-1 


Designing the MPSD Emulator Connector (12-Pin Header) 


10.1 Designing the MPSD Emulator Connector (12-Pin Header) 


The ’C3x uses modular port scan device (MPSD) technology to allow complete 
emulation through a serial scan path of the ’C3x. To communicate with the 
emulator, your target system must have a 12-pin header (2 rows of 6 pins) with 
the connections that are shown in Figure 10—1.To use the target cable, supply 
the signals shown in Table 10—1 to a 12-pin header with pin 8 cut out to provide 
keying. For the latest information, see the JTAG/MPSD Emulation Technical 
Reference. 


Although you can use other headers, the recommended header is the un- 
shrouded, straight header having the following DuPont connector systems 
part numbers: 


 65610-112 
) 65611-112 
) 37996-112 
O) 67997-112 


Figure 10-1. 12-Pin Header Signals and Header Dimensions 


Table 10-1. 


10-2 


EMuit 
EMUot 
EMU2t 
PD(Vcc) 


EMU3 
H3 


1 2 GND 

3 4 GND Header dimensions: 

° & {pane Pin width: 0.0254h, square post » 
7 8 fe pin (key)+ Pin length: 0.235-in. nominal 

9 10 |GND 

11 12 |GND 


T These signals must be pulled up with separate 20-kQ resistors to VCC. 

+ While the corresponding female position on the cable connector is plugged to prevent improper 
connection, the cable lead for pin 8 is present in the cable and is grounded as shown in the 
schematics and wiring diagrams in this document. 


12-Pin Header Signal Descriptions and Pin Numbers 


XDS510 Signal Description °C30 Pin Number ’C31 Pin Number 
EMUO Emulation pin 0 F14 124 

EMU1 Emulation pin 1 E15 125 

EMU2 Emulation pin 2 F13 126 

EMU3 Emulation pin 3 E14 123 

H3 ’C3x H3 Al 82 


Presence detect. Indicates that the emulation cable is connected 
and that the target is powered up. PD must be tied to Vcc in the 
PD target system. 


Emulator Cable Pod Logic 


10.2 Emulator Cable Pod Logic 


Figure 10—2 shows aportion of logic in the emulator cable pod. The 33-Q resis- 
tors have been added to the EMU0, EMU1, and EMU2 lines to minimize cable 
reflections. 


Figure 10-2. Emulator Cable Pod Interface 


74LVT240 a 
EMU1 (pin 1) 
33 Q 
EMUO (pin 2) 
33 Q 
EMU2 (pin 3) 
5V 
A v 
180 270 Q 74F175 


~ a 
EMUS3 (pin 9) i 
5VA ee, 
1809 : 2702 


JP2 74AS1004 


PD (Vcc pin 7) > 

1002 Q 

GND (pins 2, 4, 6, 8, 10, 12) ae 
pins 2, 4, 6, 8, 10, 12)-——— 


H3 (pin 11) 
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MPSD Emulator Cable Signal Timing 


10.3 MPSD Emulator Cable Signal Timing 


Figure 10-3 shows the signal timings for the emulator cable pod. Table 10-2 
defines the timing parameters. The timing parameters are calculated from val- 
ues specified in the standard data sheets for the emulator and cable pod and 
are for reference only. Texas Instruments does not test or guarantee these tim- 
ings. 


Figure 10-3. Emulator Cable Pod Timings 


, | ‘ 
| | 
H3 ff of 
| | | 
'¢—__ 2 —_> 4 | 
EMU1 
EMU2 | 
le— 4 —>| | 
: | k¢—_ 6 > 


| 
EMU3 x x 


Table 10-2. Emulator Cable Pod Timing Parameters 


"No. Reference —=sdDescription == ————S—S—S=*«Min’~=—s Max Unit 
1 'H3 min H3 period 35 200. ns 
tH3 max 
2 tH3 high min H3 high pulse duration 15 ns 
3 tH3 low min H3 low pulse duration 15 ns 
4 tq (EMUO, 1, 2) EMUO, 1, 2 valid from H3 low 7 23 ns 
5 tsy (EMU3) EMU3 setup time to H3 high 3 ns 
6 tha (EMU3) EMU3 hold time from H3 high 11 ns 
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Connections Between the Emulator and the Target System 


10.4 Connections Between the Emulator and the Target System 


It is extremely important to provide high-quality signals between the emulator 
and the ’C3x on the target system. In many cases, the signal must be buffered 
to produce high quality. The need for signal buffering can be divided into three 
categories, depending on the placement of the emulation header: 


Lj Nosignals buffered. In this situation, the distance between the emulation 
header and the ’C3x should be no more than 2 inches (see Figure 10-4). 


Figure 10-4. Connections Between the Emulator and the TMS320C3x With No Signals 


Buffered 


TMS320C3x 


k¢—— 2 inches or less —>} 


Emulator header 


EMUO 
EMU1 
EMU2 


EMU3 
H3 


EMUO 
EMU1 
EMU2 


EMU3 
H3 


PD 


GND 
GND 
GND 
GND 
GND 
GND 


Voc 
A 
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Connections Between the Emulator and the Target System 


_j Transmission signals buffered. In this situation, the distance between 
the emulation header and the ’C3x is greater than 2 inches but less than 
6 inches. The transmission signals, H3 and EMU8, are buffered through 
the same package (see Figure 10-5). 


Figure 10-5. Connections Between the Emulator and the TMS320C3x With Transmission 


Emulator header 


Signals Buffered 
k¢——_ 2 to 6 inches ——» 
TMS320C3x 

EMUO 3 
EMU1 ; 
EMU2 5 
NK t 9 

EMU3 —_1—_|>— 
Hg |} —t_[> +"! 


EMUO 
EMU1 
EMU2 


EMU3 
H3 


PD 


GND 
GND 
GND 
GND 
GND 
GND 


Voc 
A 
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Connections Between the Emulator and the Target System 


Lj Allsignals buffered. The distance between the emulation header and the 
’C8x is greater than 6 inches but less than 12 inches. All ’C3x emulation 
signals, EMU0, EMU1, EMU2, EMU3, and H3, are buffered through the 
same package (see Figure 10-6). 


Figure 10-6. Connections Between the Emulator and the TMS320C3x With All Signals 


Emulator header 


Buffered 
kd— 6 to 12 inches ——» 
TMS320C3x 
EMUO 2 
uf ' 1 
EMU1 : ' 
1 1 5 
EMU2 ; ; 
' ' 
: 
' ' 
EMU3 : 
; Pe 11 
H3 | >+e 
' ' 
CAUTION 


EMUO 
EMU1 
EMU2 


EMU3 
H3 


PD 


GND 
GND 
GND 
GND 
GND 
GND 


Vcc 
A 


H3 buffer restrictions 


of the signal. 


Do not connect any devices between 
the buffered H3 output and the header! 
Otherwise, you will degrade the quality 
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Mechanical Dimensions for the 12-Pin Emulator Connector 


10.5 Mechanical Dimensions for the 12-Pin Emulator Connector 


The ’C3x emulator target cable consists of a 3 foot section of jacketed cable, 
an active cable pod, and a short section of jacketed cable that connects to the 
target system. The overall cable length is approximately 3 feet, 10 inches. 
Figure 10—7 and Figure 10-8 show the mechanical dimensions for the target 
cable pod and short cable. Note that the pin-to-pin spacing on the connector 
is 0.10 inches in both the X and Y planes. The cable pod box is nonconductive 
plastic with four recessed metal screws. 


Figure 10—7. Pod/Connector Dimensions 


Emulator cable pod 
E Connector 


Short, jacketed cable 


See Figure 10-8. 


Note: All dimensions are in inches and are nominal unless otherwise specified. 
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Mechanical Dimensions for the 12-Pin Emulator Connector 


Figure 10-8. 12-Pin Connector Dimensions 


> j¢——_ 0.20 


¥v 


Connector, side view 


0.10 ——py 
Key, pin 8 
1 1 
A 
Blocked 
ly key 
J 0.70 


Cable 


L 0.10 
Vv 
Connector, front view 
Pins 1, 3, 5, 7, 9, 11 Pins 2, 4, 6, 8, 10, 12 


Note: All dimensions are in inches and are nominal unless otherwise specified. 
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Diagnostic Applications 


10.6 Diagnostic Applications 


For system diagnostic applications or to embed emulation compatibility on 
your target system, connect a 'C3x device directly to a Tl ACT8990 test bus 
controller (TBC) as shown in Figure 10-9. The TBC is described in the Texas 
Instruments Advanced Logic and Bus Interface Logic Data Book. A TBC can 
connect to only one ’C3x device. 


Figure 10-9. TBC Emulation Connections for TMS320C3x Scan Paths 


TBC 


TMSO 

TMS1 

TDO 

TCKO 

TCKI 

TDIO 

TDI 
TMS2/EVNTO 
TMS3/EVNT1 
TMS4/EVNT2 


TMS5/EVNT3 


Notes: 


Vcc 


22 kQ 


°C3x 


EMUO 

EMU1 

EMU2 

EMU4 (’C30 only) 
H1 (clock) 

EMU3 

EMU5 (’C30 only) 
EMU6 (’C30 only) 


1) In a’C3x design, the TBC can connect to only one ’C3x device. 


2) The ’C3x device’s H1 clock drives TCKI on the TBC. This is different from the 
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emulation header connections where H3 is used. 


Chapter 11 


Development Support and 
Part Ordering Information 


This chapter provides development support information, device part numbers, 
and support tool ordering information for the ’C3x. 


Each ’'C3x support product is described in the TMS320 Family Development 
Support Reference Guide. In addition, more than 100 third-party developers 
offer products that support the Tl TMS320 family. For more information, refer 
to the TMS320 Third-Party Reference Guide. 


For information on pricing and availability, contact the nearest TI field sales 
office or authorized distributor. 


Topic Page 


dit (Development SUPPOMt, cesses reesei ere ett ielereiele ele isreysielery stele ls 11-2 
11.2 TMS320C3x Part Ordering Information ..............00ceeeeeeee 11-7 


11-1 


Development Support 


11.1 Development Support 


This section describes the development support provided by Texas Instru- 
ments. 


11.1.1 Development Tools 


Texas Instruments offers an extensive line of development tools for the ’C3x 
generation of DSPs, including tools to evaluate the performance of the proces- 
sors, generate code, develop algorithm implementations, and fully integrate 
and debug software and hardware modules. These tools are described below. 


Code Generation Tools 
There are two types of code generation tools: 


(1 Optimizing ANSI C compiler. Translates ANSI C language directly into 
highly optimized assembly code. Youcan then assemble and link this code 
with the Tl assembler/linker, which is shipped with the compiler. It supports 
both ’C3x and ’C4x assembly code. This product is currently available for 
the PC (DOS, DOS extended memory, and OS/2), VAX/VMS, and SPARC 
workstations. See the TMS320 Floating-Point DSP Optimizing C Compiler 
User’s Guide for detailed information. 


Lj Assembler/linker. Converts source mnemonics to executable object code. 
It supports both ’C3x and ’C4x assembly code. This product is currently 
available for the PC (DOS, DOS extended memory, and OS/2). The 
’C8x/’C4x assembler for the VAX/VMS and SPARC workstations is only 
available as part of the optimizing 'C3x/C4x compiler. See the TMS320 
Floating-Point DSP Assembly Language Tools User’s Guide for detailed 
information. 


Development Support 


System Integration and Debug Tools 
There are four types of system integration and debug tools: 


(1 Simulator. Simulates through software the operation of the ’C3x and can 
be used in C and assembly software development. This product is current- 
ly available for the PC (DOS and Windows) and SPARC workstations. See 
the TMS320C3x C Source Debugger User’s Guide for detailed informa- 
tion. 


(i XDS510 emulator. Performs full-speed in-circuit emulation with the ’C3x, 
providing access to all registers as well as to internal and external memory. 
It can be used in C and assembly software development and has the capa- 
bility of debugging multiple processors. This product is currently available 
for the PC (DOS, Windows, and OS/2) and SPARC workstations. This 
product includes the emulator board (emulator box, power supply, and 
small computer system interface (SCSI) connector cables in the SPARC 
version), the ’C3x C source debugger software, and the JTAG cable. 


Because ’C3x and ’C5x XDS510™ emulators also come with the same 
emulator board (or box), you can buy the ’C3x C source debugger soft- 
ware as a Separate product called the ’C3x C Source Debugger Conver- 
sion Software. This enables you to debug ’C3x/’C4x/’C5x applications with 
the same emulator board. The emulator cable that comes with the ’C5x 
XDS510 emulator is not compatible with the ’C3x. You need a JTAG 
emulation conversion cable. See the TMS320C3x C Source Debugger 
User’s Guide for detailed information on the ’C3x emulator. 


(1 Evaluation module (EVM). Each EVM comes complete with a PC halfcard 
and software package. The EVM board contains the following: 


m@ A’C30 and a 33-MFLOPS, 32-bit floating-point DSP 


m A16K-word, zero-state SRAM, allowing coding of most algorithms di- 
rectly on the board 


m A speaker/microphone-ready analog interface for multimedia, 
speech, and audio applications development 


m@ Amultiprocessor serial port interface for connecting to multiple EVMs 
m@ A host port for PC communications 


The system also comes with all the software required to begin applications 
development on a PC host. Equipped with a C and assembly language 
source-level debugger for the DSP, the EVM has a window-oriented, 
mouse-driven interface that enables the downloading, executing, and de- 
bugging of assembly code or C code. 


Development Support and Part Ordering Information 11-3 


Development Support 


The ’C3x assembler/linker is also included with the EVM. For users who 
prefer programming in a high-level language, an optimizing ANSI C com- 
piler and an Ada compiler are offered separately. 


Lj Emulation porting kit (EPK). Enables you to integrate emulation technolo- 
gy directly into your system without the need of an XDS510 board. The 
EPK is intended to be used by third parties and high-volume board 
manufacturers and requires a licensing agreement with Texas Instru- 
ments. The kit contains host (or PC) source and object code, which lets 
you tailor °C30 EVM-like capabilities to your ’C3x system through the 
SM74ACT8990 test bus controller (TBC). The EPK can be used in such 
applications as program download for system self test and initialization or 
system emulation and debug to feature resident emulation support. EPK 
software includes the TI high-level language (HLL) debugger in object as 
well as source code for the TBC communication interface. The HLL code 
is the windowed debugger found with many TI DSP simulators, EVMs, and 
emulators. With the EPK, the HLL user interface can be ported directly to 
the system board. The source code for the TBC communication interface 
consists of such commands as read/write, memory run, stop, and reset 
that communicate with the ’C3x device. Using the EPK reduces system 
and development cost and speeds time to market. For more information 
on the kit, call the DSP hotline at (281)274—2320. 


11.1.2 TMS320 Third Parties 


The TMS320 family is supported by product and service offerings from more 
than 100 independent vendors and consultants, known as third parties. These 
support products take various forms (both software and hardware) from cross- 
assemblers, simulators, and DSP utility packages to logic analyzers and emu- 
lators. Additionally, Tl third parties offer more than 150 algorithms that are 
available for license through the TMS320 software cooperative. These algo- 
rithms can greatly reduce development time and decrease time to market. The 
expertise of those involved in support services ranges from speech encoding 
and vector quantization to software/hardware design and system analysis. 


For a more detailed description of services and products offered by third par- 
ties, See the TMS320 Third Party Support Reference Guide and the TMS320 
Software Cooperative Data Sheet Packet. Call the Literature Response Cen- 
ter at (800) 477-8924 to request a copy. 


Development Support 


11.1.3 Technical Training Organization (TTO) TMS320 Workshop 


The ’C3x DSP design workshop is tailored for hardware and software design 
engineers and decision-makers who design and use the ’C3x generation of 
DSP devices. Hands-on exercises throughout the course give participants a 
rapid start in using 'C3x design skills. Microprocessor/assembly language ex- 
perience is required. Experience with digital design techniques and C lan- 
guage programming experience is desirable. The following topics are covered 
in the ’C3x workshop: 


’C8x architecture/instruction set 

_j Use of the PC-based ’C3x software simulator and EVM 
_j Floating-point and parallel operations 

_j Use of the ’'C3x assembler/linker 

[1 C programming environment 
_] 
_] 


uu 


System architecture considerations 
Memory and I/O interfacing 
’°C3x development support 


uu 


For registration, pricing, or enrollment information on this and other TTO 
TMS320 workshops, call (800) 336-5236, ext. 3904. 


11.1.4 TMS320 Literature 


11.1.5 DSP Hotline 


Extensive DSP documentation is available, including data sheets, user’s 
guides, and application reports. In addition, DSP textbooks that aid research 
and education have been published by Prentice-Hall, John Wiley and Sons, 
and Computer Science Press. To order literature or to subscribe to the DSP 
newsletter Details on Signal Processing (for up-to-date information on new 
products and services), call the Literature Response Center at (800)477-8924 
or log on to the DSP Solutions web site at http:/Avww.ti.com/dsps. 


For answers to TMS320 technical questions on device problems, develop- 
ment tools, documentation, upgrades, and new products, you can contact the 
DSP hotline by: 


[J Phone at (281) 274-2320 Monday through Friday from 8:30 a.m. to 
5:00 p.m. Central Time 


Fax at (281) 274-2324 
Electronic mail at dsph@ti.com 


European fax at 33-1—3070-1032 


i a 


Semiconductor Product Information Center (PIC) at (214) 644-5580 
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To ask about third-party applications and algorithm development packages, 
contact the third party directly. See the TMS320 Third-Party Support Refer- 
ence Guide for addresses and phone numbers. 


The DSP hotline does not provide pricing information. Contact the nearest TI 
field sales office or the TI PIC for prices and availability of TMS320 devices and 
support tools. 


11.1.6 Bulletin Board Service (BBS) 


The TMS320 DSP Bulletin Board Service (BBS) is a telephone-line computer 
service that provides information on TMS320 devices, specification updates 
for current or new devices and development tools. The BBS also gives infor- 
mation about silicon and development tool revisions and enhancements, new 
DSP application software as it becomes available, and source code for pro- 
grams from any TMS320 user’s guide. 


You can access the BBS by: 


[1 Modem: (300-, 1200-, or 2400-bps) dial (713)274—2323. Set your modem 
to 8 data bits,1 stop bit, no parity. 


_j Internet: Use anonymous fio to stp.ticom (Internet port address 
192.94.94.1). The BBS content is located in the subdirectory called mir- 
rors. 


To find out more about the BBS, see the TMS320 Family Development Support 
Reference Guide. 


11.2 TMS320C3x Part Ordering Information 


Table 11-1. 


Device 
TMS320C30GEL 
TMS320C30GEL40 
TMS320C31PQL/PQA 
TMS320C31PQL40 
TMS320LC31PQL 
TMS320C031PQL50 


SMJ320C0316FA27 
SMJ320C031HF627 
SMJ320C0316FA33 
SMJ320C0316HF633 


SMJ320C306BM33 
SMJ320C30HF633 


SMJ320C30GBM28 
SMJ320C30HF628 
SMJ320C30HTM28 


SMJ320C30GBM25 
SMJ320C30HF625 
SMJ320C30HTM25 


TMS320C3x Part Ordering Information 


This section provides device and support tool part numbers. Table 11-1 lists 
the part numbers for the C30 and ’C31; Table 11-2 gives ordering information 
for 'C3x hardware and software support tools. An explanation of the TMS320 
family device and development support tool prefix and suffix designators fol- 
lows the two tables to assist in understanding the TMS320 product numbering 


system. 


Technology 
0.8-uum CMOS 


0.8-um CMOS 
0.8-um CMOS 
0.8-um CMOS 
0.8-um CMOS 
0.8-um CMOS 
0.8-um CMOS 


0.8-um CMOS 


0.8-um CMOS 


0.8-um CMOS 


Operating 
Frequency 


33 MHz 
40 MHz 
33 MHz 
40 MHz 
33 MHz 
50 MHz 
28 MHz 


33 MHz 


28 MHz 


25 MHz 


TMS320C3x Digital Signal Processor Part Numbers 


Package Type 
Ceramic 181-pin PGA 


Ceramic 181-pin PGA 
Plastic 132-pin QFP 
Plastic 132-pin QFP 
Plastic 132-pin QFP 
Plastic 132-pin QFP 


Ceramic 141-pin PGA 
Ceramic 132-pin QFP 
Ceramic 141-pin PGA 
Ceramic 132-pin PGA 


Ceramic 181-pin PGA 
Ceramic 196-pin QFP 


Ceramic 181-pin PGA 
Ceramic 196-pin QFP 


Ceramic 181-pin PGA 
Ceramic 196-pin QFP 


Typical Power 


Dissipation 
1.00 W 
1.25 W 
0.75 W 
0.90 W 
0.50 W 
1.00 W 


0.60 W 
0.60 W 
0.75 W 
0.75 W 


1.10 W 


1.00 W 
1.00 W 


1.00 W 
1.00 W 
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Table 11-2. TMS320C3x Support Tool Part Numbers 


(a) Software 


Tool Description 


C Compiler & Macro Assembler/ Linker 


Assembler/Linker 


Simulator 


Digital Filter Design Package 


TMS320C3x Emulation Porting Kit 


(b) Hardware 


Tool Description 
XDS510 Emulator 


Evaluation Module (EVM) 


t Note that SUN UNIX supports ’C3x software tools on the 68 000 family-based SUN-3 series workstations and on the SUN-4 


Operating System 


VAX/VMS 
PC-DOS/MS-DOS 
SPARC (Sun OS)t 


PC-DOS/MS-DOS; OS/2 


VAX VMS 
PC-DOS/MS-DOS 
SPARC (SUN OS) t 


PC-DOS 
PC; SPARC 


Operating System 
PC/MS-DOS 
PC/MS-DOS 


Part Number 


TMDS3243255-08 
TMDS3243855-02 
TMDS3243555-08 


TMDS3243850-02 


TMDS3243251-08 
TMDS3243851 -02 
TMDS3243551-09 


DFDP 
TMDX3240030 


Part Number 
TMDS3240130 


TMDS3260030 


series machines that use the SPARC processor, but not on the SUN-386i series of workstations. 
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11.2.1 Device and Development Support Tool Prefix Designators 
Prefixes to Tl part numbers designate phases in the product’s development 
stage for both devices and support tools, as shown in the following definitions: 
Device Development Evolutionary Flow 


1 TMX: Experimental device that is not necessarily representative of the 
final device’s electrical specifications 


_) TMP: Final silicon device that conforms to the device’s electrical specifica- 
tions but has not completed quality and reliability verification 


.) TMS: Fully qualified production device 


Support Tool Development Evolutionary Flow 


Lj) TMDX: Development support product that has not yet completed TI’s 
internal qualification testing for development systems 


_1 TMDS: Fully qualified development support product 


TMX and TMP devices and TMDX development support tools are shipped with 
the following disclaimer: 


“Developmental product is intended for internal evaluation purposes.” 


ae | 


Note: Prototype Devices 


Tl recommends that prototype devices (TMX or TMP) not be used in produc- 
tion systems. Their expected end-use failure rate is undefined but predicted 
to be greater than standard qualified production devices. 


a) 


TMS devices and TMDS development support tools have been fully character- 
ized, and their quality and reliability have been fully demonstrated. Tl’s stan- 
dard warranty applies to TMS devices and TMDS development support tools. 


TMDX development support products are intended for internal evaluation pur- 
poses only. They are covered by TI's warranty and update policy for micropro- 
cessor development systems products; however, they should be used by cus- 
tomers only with the understanding that they are developmental in nature. 
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11.2.2 Device Suffixes 


The suffix indicates the package type (for example, N, FN, or GE) and temper- 
ature range (for example, L). 


Figure 11-1 presents a legend for reading the complete device name for any 
TMS320 family member. 


Figure 11-1. TMS320 Device Nomenclature 


TMS 320 C 30 GE L 


Prefix | L Temperature Range 


TMX= Experimental device H = Oto 50°C 
TMP = Prototype device L = Oto 70°C 
TMS= Qualified device S = -55 to 100°C 
SMJ = MIL-STD-883C M = -55 to 125°C 
At= -40 to 85°C 
Device Family ——————_ '— Package Type 
320 = TMS320 family FD = Leadless ceramic chip 


carrier 


Technology FJ = Ceramic leaded chip carrier 
C = CMOS FN = Plastic leaded chip carrier 
E = CMOS EPROM FZ = Ceramic leaded chip carrier 
P = OTPEPROM GB = Ceramic pin grid array 
No letter = NMOS GE = Ceramic pin grid array, 
glass seal 
Device HT = Ceramic quad flatpack 
“ i : (gull wing) 
oo por HU = Ceramic quad flatpack 
JD = Ceramic dual in line 
14 package side brazed 
15 N= Plastic dual in line package 
16 PQ = Plastic quad flatpack 
17 
2nd-generation DSP: 
20 
25 
26 
3rd-generation DSP: 
30 
31 
32 
4th-generation DSP: 
40 
5th-generation DSP: 
50 
51 


t See electrical specifications for C31 PQA case temperature ratings 
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TMS320C30 Power Dissipation 


This chapter presents the information necessary to determine the require- 
ments for the power supply current for the ‘C30 under different operating 
conditions. 


As device sophistication and levels of integration increase with evolving semi- 
conductor technologies, actual levels of power dissipation vary widely. These 
levels depend heavily on the particular application in which the device is used 
and the nature of the program being executed. In addition, due to the charac- 
teristics of CMOS technology, power requirements vary according to clock 
rates and data values being processed. Using this information, you can deter- 
mine the device’s power dissipation and, in turn, calculate thermal manage- 
ment requirements. 
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12.1 Power Dissipation Characteristics 


Generally, power supply current requirements are related to the system, for ex- 
ample, operating frequency, supply voltage, temperature, and output load. As 
devices become more complex, the specification must also be based on what 
the device does. CMOS devices inherently draw current only during switching 
through the linear region. Therefore, the power supply current is related to the 
rate of switching. Furthermore, since the output drivers of the ’C30 are specified 
to drive direct current (dc) loads, the power supply current resulting from exter- 
nal writes depends not only on switching rate but also on the value of data writ- 
ten. 


12.1.1 Power Supply Factors 


The power-supply current consists of four basic factors: 


[J Quiescent current 
(j Internal operations 
Lj Internal bus operations 
[41 External bus operations 


12.1.2 Power Supply Consumption Dependencies 
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The power-supply current consumption depends on many factors. Four are 
system-related: 


_] Operating frequency 
L1 Supply voltage 

1 Operating temperature 
Lj) Output load 


Several other factors are related to C30 operation. They include: 


Duty cycle of operations 

Number of buses used 

Wait states 

Cache usage 

Data value of internal and external bus 


Oooo oe 


Power Dissipation Characteristics 


The total power supply current for the device is described in the following equa- 
tion, which applies the four basic power supply current factors and the depen- 
dencies described above: 


| = (Iq + liops + | ibus + Ixus) & FV x T 
where: 
Iq = quiescent current 


liops = Current from internal operations 


linus = Current from internal bus usage, including data value and cycle time 
dependencies 


Ixbus = Current from external bus usage, including data value, wait state, 
cycle time, and capacitive load dependencies 


FV = scale factor for frequency and supply voltage 


T = scale factor for operating temperature 


The application of this equation and the determination of all of the dependen- 
cies are described in detail in this chapter. 


If aless detailed analysis is sufficient, use the minimum, typical, and maximum 
values to determine a rough estimate of the power supply current require- 


ments: 


[} The minimum power supply current requirement is 110 mA. 


[1 The typical and average current consumption is 200 mA, as described in 
the TMS320C30 Digital Signal Processor data sheet. These are 
associated with most algorithms running on the device unless data output 
is excessive. 


(1 Ifanextremely conservative approach is desired, use the maximum value. 


Maximum Current Requirement 
The maximum current requirement is 600 mA and occurs only 
under worst case conditions. These include writing alternating 


data (AAAAAAAAH to 55555555h) out of both external buses 
simultaneously, every cycle, with 80 pF loads, and running at 
33 MHz. 
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12.1.3 Determining Algorithm Partitioning 


Each part of an algorithm has its own pattern with respect to internal and exter- 
nal bus usage. To analyze the power supply current requirement, you must 
partition an algorithm into segments with distinct concentrations of internal or 
external bus usage. Analyze each program segment to determine its power 
supply current requirement. You can then calculate the average power supply 
current from the requirements of each segment of the algorithm. 


12.1.4 Test Setup Description 


All C30 supply current measurements were performed on the test setup 
shown in Figure 12—1. The test setup consists of a C30, 8K words of zero- 
wait-state Cypress Semiconductor SRAMs (CY7C186—25PC), and resistor/ 
capacitor (RC) loads on all data and address lines. A Tektronix™ current probe 
(P6042) measures the power supply current in all Vpp lines of the device. The 
supply voltage on the output load is 2.15 V. Unless otherwise specified, all 
measurements are made at a: 

L1 Supply voltage of 5.0 V 

[1 Input clock frequency of 33 MHz 

1 Capacitive load of 80 pF 

1 Operating temperature of 25°C 


Figure 12-1. Current Measurement Test Setup for the TMS320C30 
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12.2 Current Requirements for Internal Circuitry 


The power supply current requirement for internal circuitry consists of the fol- 
lowing factors: quiescent current, internal operations, and internal bus opera- 
tions. Quiescent current and internal operations are constants, but the internal 
bus operations vary with the rate of internal bus usage and the data values be- 
ing transferred. 


12.2.1 Quiescent Current 


Quiescent current refers to the baseline supply current drawn by the ’C30 dur- 
ing minimal internal activity. It includes the current required to fetch an instruc- 
tion from on- or off-chip memory. Examples of quiescent current include: 


1 Maintaining timers and serial ports 

Executing the IDLE instruction 

’°C30 in HOLD mode pending external bus access 
C30 in reset 

Branching to self 


J 
= 
= 
iz 
The quiescent requirement for the ‘C30 equals 110 mA. 


12.2.2 Internal Operations 


Internal operations include register-to-register multiplication, ALU operations, 
and branches. It does notinclude external bus usage or significant internal bus 
usage. Internal operations add a constant 55 mA above the quiescent current. 
Therefore, the total contribution of quiescent current (110 mA) and internal 
operations (55 mA) is 165 mA. During an RPTS instruction (repeat single 
instruction), activity other than the instruction being repeated is suspended; 
therefore, internal power supply current is related only to the operation per- 
formed by the instruction being executed. 


12.2.3 Internal Bus Operations 


Internal bus operations include all operations that use the internal buses 
extensively, such as internal RAM access every cycle. No distinction is made 
between internal reads (such as instruction or operand fetches from internal 
ROM or internal RAM banks) and internal writes (such as operand stores to 
internal RAM banks); internally they are equal. Since power consumption 
depends on the data value in the internal bus, significant use of internal buses 
adds a data-dependent factor to the power supply current. 
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Pipeline conflicts, use of cache, fetches from external wait-state memory, and 
writes to external wait-state memory all affect the internal and external bus 
cycles of an algorithm executing on the C30. Therefore, you must determine 
the algorithm’s internal usage in order to accurately calculate the power supply 
current requirements. The ’C30 software simulator and XDS™ emulator both 
provide benchmarking and timing capabilities that help you determine bus 
usage. 


The current resulting from internal bus usage varies exponentially with transfer 
rates. Figure 12-2 shows the internal bus current requirements for transfer- 
ring alternating data (AAAAAAAAh to 55555555h). A transfer rate less than 1 
implies multiple accesses per single H1 cycle (that is, using direct memory ac- 
cess (DMA), etc.). Transfer cycle times greater than 1 refer to single-cycle 
transfers with one or more cycles between them. The minimum transfer cycle 
time is one third, which corresponds to three accesses in a single H1 cycle. 


Figure 12-2. Internal Bus Current Versus Transfer Rate (AAAAAAAAh to 55555555h) 
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Transfer cycle time (H1 cycles) 


The data set AAAAAAAAh to 55555555h exhibits the maximum current for 
these types of operations. Less current is required for transferring other data 
patterns, and current values can be derated accordingly. 


As the transfer rate decreases (transfer cycle time increases), the incremental 
Ipp approaches 0 mA. Transfer rates corresponding to more than seven H1 
cycles do not add any current and are considered insignificant. This figure rep- 
resents the incremental Ipp from internal bus operations andis added to quies- 
cent and internal operations current values. 


Figure 12-3. Internal 
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Current Requirements for Internal Circuitry 


For example, the maximum transfer rate corresponds to three accesses every 
cycle or one-third H1 transfer cycle time. At this rate, 85 mA is added to the 
quiescent (110 mA) and internal operation (55 mA) current values for a total 
of 250 mA. 


Figure 12-3 shows the data dependence of the internal bus current require- 
ment when the data is other than As followed by 5s. The shaded trapezoidal 
region represents the internal bus current consumed for all possible data val- 
ues transferred. The lower line represents the scale factor for transferring the 
same data (all Os or all Fs). The upper line represents the scale factor for trans- 
ferring alternating data (all Os to all Fs or all As to all 5s). 


Bus Current Versus Data Complexity Derating Curve 
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Relative data complexity 


The number of possible permutations of data values is quite large. The extent 
to which data varies is referred to as relative data complexity. This term refers 
to a relative measure of the extent to which data values are changing and the 
extent to which the number of bits are changing state. Relative data complexity 
ranges from 0, signifying minimal variation of data, to anormalized value of 1, 
signifying greatest data variation. 
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If a statistical knowledge of the data exists, Figure 12-3 can be used to deter- 
mine the exact power supply requirement according to internal bus usage. For 
example, Figure 12-3 indicates a 63% scale factor when all Fs are moved in- 
ternally every cycle with two accesses per cycle. This scale factor is multiplied 
by 55 mA (from Figure 12—2, at one-half H1 cycle transfer time), yielding 34.65 
mA because of internal bus usage. Therefore, an algorithm running under 
these conditions requires about 200 mA of power supply current 
(110 + 55 + 34.65). 


Since a statistical knowledge of the data may not be readily available, a nomi- 
nal scale factor may be used. The median between the minimum and maxi- 
mum values at 50% relative data complexity yields a value of 0.80 and can be 
used as an estimate of a nominal scale factor. You can use this nominal data 
scale factor of 80% for internal bus data dependency, adding 44 mA to 110 mA 
(quiescent current) and 55 mA (internal operations) to yield 210 mA. As an up- 
per bound, assume worst case conditions of three accesses of alternating data 
every cycle, adding 85 mA (from Figure 12-2) to 110 mA (quiescent current) 
and 55 mA (internal operations) to yield 250 mA. 
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12.3 Current Requirement for Output Driver Circuitry 


The output driver circuits on the ‘C30 are required to drive significantly higher 
dc and capacitive loads than internal device logic. Therefore, they are de- 
signed to drive larger currents than internal devices. Because of this, output 
drivers impose higher supply current requirements than other sections of cir- 
cuitry on the device. 


Accordingly, the highest values of supply current are required when external 
writes are performed at high speed. During reads, or when the external buses 
are not in use, the ‘C30 does not drive the data bus; this eliminates the most 
significant factor of output buffer current. Furthermore, in typical cases, only 
a few address lines change, or the whole address bus is static. Under these 
conditions, an insignificant amount of supply current is consumed. When no 
external writes are performed or when writes are performed infrequently, cur- 
rent from output buffer circuitry can be ignored. 


When external writes are performed, the current required to supply the output 
buffers depends on several factors: 


_j Data pattern transferred 
[j Rate at which transfers are made 


1 Number of wait states implemented (because wait states affect rates at 
which bus signals switch) 


1 External bus dc and capacitive loading 


External operations involve writes external to the device and constitute the 
major power supply current factor. The power supply current for the external 
buses is made up of three factors and is summarized in the following equation: 


Ibase + !prim + lexp = Power supply current for the external buses 


where: 
lbase = 60-mA baseline current 
lorim = Primary bus current 


lexp = expansion bus current 


The remainder of this section describes in detail the calculation of external bus 
current factors. 
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12.3.1 Primary Bus Current 
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The current from primary bus writes varies with both wait states and write cycle 
time. Current factors from output driver circuitry are represented as offsets 
from the previously computed value (quiescent + internal operations + internal 
bus). Since the baseline value is related to internal current factors, negative 
values for current offset are obtained under some circumstances. However, 
negative current does not occur. 


To obtain accurate current values, you must first establish the timing of write 
cycles of the buses. To determine the rate and timings at which write cycles 
to the external buses occur, you must analyze program activity, including any 
pipeline conflicts that may exist. Information from this manual and the ’C30 
emulator or simulator is useful in making these determinations. You must 
account for the effects of cache use in these analyses because the cache can 
affect whether instructions are fetched from external memory. 


When evaluating external write activity ina given program segment, you must 
consider whether a particular level of external write activity is significant. If 
writes are performed at very slow rates on both the primary and the expansion 
buses, the current from external writes can be ignored. If writes are performed 
at high speed on only one of the two external buses, you should calculate cur- 
rent requirements. 


Although you can obtain negative incremental current values under some 
circumstances, the total contribution for external buses, including baseline 
current, is always positive. When external buses are not used much, the total 
current requirements approach the current contribution from the internal fac- 
tors, which is solely a function of internal activity. This places a lower limit on 
current contributions from the primary and expansion buses, because the total 
current from external buses is the sum of the 60-mA baseline value and the 
primary and expansion bus factors. This effect is discussed in further detail in 
the rest of this section. 


Once you establish bus-write cycle timing, use Figure 12—4 to determine the 
contribution to supply current from this bus activity. Figure 12-4 shows current 
contributions from the primary bus for various numbers of wait states and H1 
cycles between writes. This current contribution is exhibited when writes of al- 
ternating 55555555h and AAAAAAAAD are performed at a capacitive load of 
80 pF per output signal line. This condition exhibits the highest current values 
on the device. The curve in the figure represents incremental or additional cur- 
rent contributed by the primary bus output driver circuitry while writing alternat- 
ing 55555555h and AAAAAAAAR. Current values obtained from this graph are 
scaled and added to several other current values to calculate the total current 
for the device. As indicated in the figure, the lower curve represents the current 
contribution for 18 or more cycles between writes. 
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Figure 12-4. Primary Bus Current Versus Transfer Rate and Wait States 
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Wait states 


The number of cycles between writes refers to the number of H1 cycles be- 
tween the active portion of the write cycles (as defined in the 7MS320C30 Digi- 
tal Signal Processordata sheet), that is, wnen STRB, MSTRB, or IOSTRB and 
R/W (or XR/W, as the case may be) are low between H1 cycles. As shown in 
Figure 12—4, the minimum number of cycles between writes is 1, because with 
back-to-back writes there is one H1 cycle between active portions of the writes. 


To further illustrate the relationship between current and write cycle time, 
Figure 12-5 shows the characteristics of current for various numbers of cycles 
between writes for zero wait states. You can use the information on this curve 
to obtain more precise values of current if zero wait states are used and the 
number of cycles between writes does not fall on one of the curves in 
Figure 12-4. 
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Figure 12-5. Primary Bus Current Versus Transfer Rate at Zero Wait States 
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H1 cycles between writes 


Although these graphs contain negative current values, negative current has 
not necessarily actually occurred. The negative values exist because the 
graphs represent a current offset from the previously computed current value. 
Using this approach to depict current contributions from different factors 
breaks down the current calculations to allow you to make calculations inde- 
pendently. 


Figure 12—4 and Figure 12—5 show that the current consumption during exter- 
nal bus writes is negative if writes are performed at intervals of more than 18 
cycles. Under these conditions, use the incremental value of —-30-mA current 
contribution from the primary bus. You should use a value of —30 mA only if the 
expansion bus is used extensively because the total contribution for external 
buses, including baseline current, must always be positive. If the expansion 
bus is not used and the primary bus is not used much, the current contribution 
from the primary bus is always greater than or equal to 20 mA. This ensures 
that the correct total current value is obtained when summing external bus fac- 
tors. Once a current value has been obtained from Figure 12-4 or 
Figure 12-5, this value can, if necessary, be scaled by adata dependency fac- 
tor, as described in section 12.3.3 on page 12-14. This scaled value is then 
summed along with several other current values to determine the total supply 
current. 
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12.3.2 Expansion Bus Current 


Currents from the primary and expansion buses differ slightly for several rea- 
sons, including the fact that the expansion bus has 11 fewer address outputs 
than the primary bus (13 rather than 24). This overall current contribution is 
slightly lower from the expansion bus than from the primary bus. 


Determining the expansion bus current uses the same premise as determining 
the primary bus current. Figure 12-6 and Figure 12—7 show the same current 
relationships for the expansion bus as Figure 12—4 and Figure 12-5 show for 
the primary bus. The total external buses’ current contributions must be posi- 
tive; if the primary bus is not used and the expansion bus is not used much, 
the minimum current contribution from the expansion bus is —30 mA. The cur- 
rent values obtained from these figures must be scaled by a data dependency 
factor, as described in section 12.3.3 on page 12-14. 


Figure 12-6. Expansion Bus Current Versus Transfer Rate and Wait States 
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Figure 12—7. Expansion Bus Current Versus Transfer Rate at Zero Wait States 
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12.3.3 Data Dependency Factors 
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Data dependency of current for the primary and expansion buses is expressed 
as a scale factor that is a percentage of the maximum current of either of the 
two buses. Data dependencies are shown in Figure 12-8 for the primary bus 
and in Figure 12—9 for the expansion bus. 


These two figures show normalized weighting factors that you can use to scale 
current requirements on the basis of patterns in data being written on the exter- 
nal buses. The range of possible weighting factors forms a trapezoidal pattern 
bounded by extremes of data values. As can be seen from Figure 12-8 and 
Figure 12-9, the minimum current is exhibited by writing all Os, while the maxi- 
mum current occurs when writing alternating 55555555h and AAAAAAAA. 
This condition results in a weighting factor of 1, which corresponds to using the 
values from Figure 12—4 and/or Figure 12-5 directly. 


As with internal bus operations, data dependencies for the external buses are 
well defined, but accurate prediction of data patterns is often impractical. Un- 
less you have precise knowledge of data patterns, you should use an estimate 
of a median or average value for scale factor. If you assume that data is neither 
5s and As, nor all Os, and varies randomly, a value of 0.85 is appropriate. 
Otherwise, if you prefer a conservative approach, you can use a value of 1.0 
as an upper bound. 
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Figure 12-8. Primary Bus Current Versus Data Complexity Derating Curve 
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Figure 12-9. Expansion Bus Current Versus Data Complexity Derating Curve 
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Regardless of the approach you take for scaling, once you determine the scale 
factors for primary and expansion buses, apply these scale factors to the cur- 
rent values found by using the graphs in the previous two sections. For exam- 
ple, if a nominal scale factor of 0.85 is used and the system uses zero wait 
states with two cycles between accesses on both the primary and expansion 
buses, the current contribution from the two buses is as follows: 


Primary: 0.85 x 80 mA = 68 mA 
Expansion: 0.85 x 40mA=34mA 


12.3.4 Capacitive Load Dependence 


Once you account for cycle timing and data dependencies, calculate and apply 
the capacitive loading effects. Figure 12-10 shows the scale factor to apply to 
the current values obtained above as a function of actual load capacitance if 
the load capacitance presented to the buses is less than 80 pF. 


In the previous example, if the load capacitance is 20 pF instead of 80 pF, a 
scale factor of 0.84 is used, yielding: 


Primary: 0.84 x 68 mA = 57.12 mA 
Expansion: 0.84 x34 mA = 28.56 mA 


The slope of the load capacitance line in Figure 12—10 is 26% normalized Ipp 
per pF. While this slope may be used to interpolate scale factors for loads 
greater than 80 pF, the ’C30 is specified to drive output loads of less than 
80 pF. Interface timings cannot be ensured at higher loads. 


Figure 12-10. Current Versus Output Load Capacitance 
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12.4 Calculation of Total Supply Current 


The previous sections discuss currents contributed by several sources on the 
’°C30. Because actual current values are unique and independent for each 
source, each current source is discussed separately. In an actual application, 
however, the sum of the independent contributions from each current deter- 
mines the total current requirement for the device. This current value is the 
total current supplied to the device through all of the Vpp inputs and returned 
through the Vss connections. 


Note that numerous Vpp and Vss pins on the device are routed to a variety of 
internal connections, not all of which are common. Externally, however, all of 
these pins must be connected in parallel to a 5-volt source and use ground 
planes with as little impedance as possible. 


12.4.1 Combining Supply Current from All Factors 


To determine the total supply current requirements for any given program 
activity, calculate each of the appropriate factors and combine them in the fol- 
lowing sequence: 


1) Start with 110-mA quiescent current. 


2) Add 55 mA for internal operations unless the device is dormant. Dormant 
periods occur during the execution of IDLE, NOPs, branches to self, or 
performance of internal and/or external bus operations using an RPTS 
instruction (see section 12.2.2 on page 12-5). Internal or external bus 
operations executed through RPTS do not contribute an internal opera- 
tions power supply current factor. However, current factors in the next two 
steps may still be required, even though the 55 mA is omitted. 


3) If significant internal bus operations are performed, add the calculated cur- 
rent value. (See section 12.2.3 on page 12-5.) 


4) If external writes are performed at high speed, add 60 mA and then add 
the values for primary and expansion bus current factors. (See sec- 
tion 12.3 on page 12-9.) If only one external bus is used, the appropriate 
incremental current for the unused bus must still be included because the 
current offsets include factors required for operating both buses. The total 
current contribution for external buses, including baseline, is always posi- 
tive. 


The current value obtained from summing these factors is the total device 
current requirement for a given program activity. 
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12.4.2 Supply Voltage, Operating Frequency, and Temperature Dependencies 


Figure 12-11. 
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Current dependencies specific to each supply current factor (Such as internal 
or external bus operations) are discussed in section 12.1.2 on page 12-2. 
Supply voltage level, operating temperature, and operating frequency affect 
the requirements for the total supply current and must be maintained within the 
required device specifications. 


Once you determine the total current for a particular program segment, the 
dependencies that affect the total current requirements are applied as a scale 
factor in the same manner as data dependencies discussed in other sections. 
Figure 12—11 shows the relative scale factors for the supply current values as 
a function of both Vpp and operating frequency. 


Power supply current consumption does not vary significantly with operating 
temperature. However, a scale factor of 2% normalized Ipp per 50°C change 
in operating temperature may be used to derate current within the specified 
range noted in the TMS320C30 Digital Signal Processor data sheet. This tem- 
perature dependence is shown graphically in Figure 12-12. A temperature 
scale factor of 1.0 corresponds to current values at 25°C, which is the tempera- 
ture for all references in the document. 


Current Versus Frequency and Supply Voltage 
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Figure 12-12. Current Versus Operating Temperature Change 
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12.4.3 Total Current Equation Example 


The procedure for determining the power supply current requirement is sum- 
marized in the following equation: 


|= (Iq + liops + libus + Ixbus) xX FV XT 


where: 
Iq =110mA 


libus = D1 X f1 (See Table 12-1 on page 12-20) 


Ixpus = 'base + !prim + exp 
with 
lbase = 60 mA 
lorim = D2 x Co x Fo (see Table 12-1) 
lexp = Dg x Cg x Fg (see Table 12-1) 
FV = scale factor for frequency and supply voltage 


T = scale factor for operating temperature 


Table 12—1 describes the variables used in the power supply current equation. 
The table displays figure numbers from which the value can be obtained. 
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Table 12-1. 


Current Equation Variables 


Variable Description Graph/Value 

Iq Quiescent current 110 mA 

liops Internal operations current 55 mA 

linus Internal bus operations current t 

D, Internal bus data scale factor Figure 12-3 

fy Internal bus current requirement Figure 12-2 

Ixbus External bus operations current t 

Ibase External bus base current 60 mA 

lprim Primary bus operations current t 

Do Primary bus data scale factor Figure 12-8 

Co Primary bus capacitance load scale factor Figure 12-10 

fo Primary bus current requirement Figure 12-4 or 
Figure 12-5 

lexp Expansion bus operations current t 

D3 Expansion bus data scale factor Figure 12-9 

C3 Expansion bus capacitance load scale factor Figure 12-10 

fg Expansion bus current requirement Figure 12-6 or 
Figure 12-7 

FV Frequency/supply voltage scale factor Figure 12-11 

T Temperature scale factor Figure 12-12 


T See power supply current equation on page 12-19. 


12.4.4 Peak Versus Average Current 


12-20 


If current is observed over the course of an entire program, some segments usu- 
ally exhibit significantly different levels of current required for different durations 
of time. For example, a program may spend 80% of its time performing internal 
operations, drawing a current of 250 mA; it may spend the remaining 20% of its 
time performing writes at full speed to the expansion bus, drawing 300 mA. 


While knowledge of peak current levels is important in order to establish power 
supply requirements, some applications require information about average 
current. This is particularly significant if periods of high peak current are short 
in duration. Average current can be obtained by performing a weighted sum 
of the currents from the various independent program segments over time. In 
the example above, the average current can be calculated as follows: 


1 = 0.8 x 250 mA + 0.2 x 300 mA = 260 mA 


Using this approach, you can calculate average current for any number of pro- 
gram segments. 


Calculation of Total Supply Current 


12.4.5 Thermal Management Considerations 


Heating characteristics of the C30 depend on power dissipation, which in turn 
depends on power supply current. When you make thermal management cal- 
culations, you must consider how power supply current contributes to power 
dissipation and to the time constant of the C30 package thermal characteris- 
tics. 


Depending on sources and destinations of current on the device, some current 
contributions to Ipp do not constitute a factor of power dissipation at 5 V. 
Accordingly, if you use the total current flowing into Vpp to calculate power dis- 
sipation at 5 V, you obtain erroneously large values for power dissipation. 
Power dissipation is defined as: 


P=IxV 
where: 
P = power 
I = current 
V = voltage 


If device outputs are driving any dc load to a logic high level, only a minor con- 
tribution is made to power dissipation, because CMOS outputs typically drive 
to alevel within a few tenths of a volt of the power supply rails. If this is the case, 
subtract these current factors out of the total supply current value; then calcu- 
late their contribution to power dissipation separately and add it to the total 
power dissipation (see Figure 12-13). If this is not done, these currents result- 
ing from driving a logic high level into a dc load cause unrealistically high power 
dissipation values. The error occurs because the currents resulting from driv- 
ing a logic high level into a dc load appears as a portion of the current used 
to calculate power dissipation from Vpp at 5 volts. 
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Calculation of Total Supply Current 


Figure 12-13. Load Currents 
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Furthermore, external loads draw supply-only current when outputs are driven 
high because, when outputs are in the logic 0 state, the device is sinking cur- 
rent that is supplied from an external source. The power dissipation from this 
current factor does not have a contribution through Ipp but contributes to pow- 
er dissipation with a magnitude of: 


P=VoL X lot 
where: 
VoL = low-level output voltage 


IoL = current being sunk by the output (as shown in Figure 12-13) 


The power dissipation factor from outputs that are driven low must be calcu- 
lated and added to the total power dissipation. 


When outputs with dc loads are switched, the power dissipation factors from 
outputs being driven high and outputs being driven low are averaged and add- 
ed to the total device power dissipation. You should calculate power factors 
from dc loading of the outputs separately for each program segment before 
you calculate average power. 


Any unused inputs that are left disconnected may float to a voltage level that 
causes input buffer circuits to remain in the linear region and therefore contrib- 
ute a significant factor to power supply current. Accordingly, you should deacti- 
vate any unused inputs by grounding them or pulling them high if you desire 
absolute minimum power dissipation. If you must pull several unused inputs 
high, pull them high together using one resistor to minimize component count 
and board space. 


Calculation of Total Supply Current 


When you use power dissipation values to determine thermal requirements, 
you should use the average power unless the time duration of individual pro- 
gram segments is long. The thermal characteristics of the C30 in the 181-pin 
grid array (PGA) package are exponential in nature, with a time constant of 
t = 4.5 minutes. When subjected to a change in power, the temperature of the 
device package will, after 4.5 minutes, reach approximately 63% of the total 
temperature change. Accordingly, if the time duration of program segments 
exhibiting high power dissipation values is short (on the order of a few 
seconds), you can use average power, calculated in the same manner as aver- 
age current (as described in section 12.4.4 on page 12-20). 


Otherwise, you should calculate maximum device temperature on the basis of 
the actual time duration of the program segments involved. For example, if a 
particular program segment lasts for seven minutes, you can calculate that a 
device will reach approximately 80% of the temperature change from the total 
power dissipation during the program segment. 


You can determine average power by calculating the power for each program 
segment (including the previous considerations) and performing a time aver- 
age of these values, rather than simply multiplying the average current as de- 
termined in the previous section by Vpp. 


Specific device temperature calculations are made using the ’C30 thermal 
impedance characteristics in the TMS320C30 Digital Signal Processor data 
sheet. 
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12.5 Example Supply Current Calculations 


12.5.1 Processing 
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A fast Fourier transform (FFT) is a typical DSP algorithm. The FFT code in the 
example calculation processes data in the RAM blocks and writes the result 
out to zero-wait-state external SRAM on the primary bus. The program 
executes out of zero-wait-state external SRAM on the primary bus, and 
enables the 'C30’s cache. The entire algorithm consists mainly of internal bus 
operations and includes quiescent current and internal operations. At the end 
of processing, the 1024 results are written to the primary bus. Therefore, the 
algorithm exhibits a higher current requirement during the write portion, where 
the external bus is used significantly. 


The processing portion of the algorithm is 95% of the FFT execution. During 
this portion, the power supply current is required only for the internal circuitry. 
Data is processed in several loops. During these loops, two operands are 
transferred on every cycle. The current required for internal bus operations is 
55 mA, (see section 12.2.2 on page 12-5). The data is assumed to be ran- 
dom. A data value scale factor of 0.8 is used from Figure 12-3 on page 12-7. 
This value scales 55 mA, yielding 44 mA for internal bus operations. Adding 
44 mA to the quiescent current requirement and internal operations current 
requirement yields a current requirement of 209 mA for the major portion of the 
algorithm. 


|= Iq = liops + libus 


|= 110 mA +55 mA + (55mA)(0.8) = 209 mA 


12.5.2 Data Output 


Example Supply Current Calculations 


The portion of the FFT corresponding to writing out data is approximately 5% 
of the total processing time. Again, the data being written is assumed to be ran- 
dom. From Figure 12-3 on page 12-7 and Figure 12-8 on page 12-15, scale 
factors of 0.80 and 0.85 are used for derating from data value dependency for 
internal and primary buses, respectively. During the data dump portion of the 
code, a load and store are performed every cycle. The parallel load/store 
instruction is in an RPTS loop, so there is no contribution from internal opera- 
tions because the instruction is fetched only once. The only internal contribu- 
tions are from quiescent current and internal bus operations. Figure 12-5 on 
page 12-12 indicates a 170-mA current contribution from back-to-back zero- 
wait-state writes, and Figure 12-7 on page 12-14 indicates a —-80-mA con- 
tribution when the expansion bus is idle (that is, with more than 18 H1 cycles 
between writes). The total contribution from this portion of the code is: 


| = Ig + lipus + Ixbus 
or 


| = 110 + (55 mA)(0.8) + 60 mA — 80 mA + (170 mA)(0.85) = 278.5 mA 


12.5.3 Average Current 


The average current is derived from the two portions of the FFT. The proces- 
sing portion takes 95% of the time and requires about 210 mA, and the data 
dump portion takes the other 5% and requires about 280 mA. The average is 
calculated as: 


lavg = (0.95)(210 mA) + (0.05)(280 mA) = 213.5 mA 


From the thermal characteristics specified in the C30 data sheet, it can be 
shown that this current level corresponds to a case temperature of 43°C. This 
temperature meets the maximum device specification of 85°C and, hence, 
requires no forced air cooling. 
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12.5.4 Experimental Results 


A photograph of the power supply current for the FFT is shown Figure 12-14. 
During the FFT processing, the measured current varies between 180 and 
220 mA. The peak of the current during external writes is 270 mA, and the 
average current requirement, as measured on a digital multimeter, is 200 mA. 
The calculations yield results that are extremely close to the actual measured 
power supply current. 


Figure 12-14. Photo of Ipp for FFT 
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TMS320C32 Boot Table Examples 


The C32 boot loader loads programs received from standard memory devices 
or through the serial port. These programs have a particular data stream struc- 
ture called a boot table. This appendix shows examples of different ‘C32 boot 
tables in 32-, 16-, and 8-bit-wide ROM that are transmitted through the serial 
port. 


Figure A-1 through Figure A-4 show four instances of the boot table, each 
containing four blocks. The destination for the first and third block of each boot 
table is 16-bit STRBO memory. The second block is booted to the 32-bit 
IOSTRB memory. Block 4 is destined for the 8-bit memory in the STRB1 por- 
tion of the memory map. 


Each figure represents a boot from a different source medium. In Figure A-1, 
the boot table resides in the 32-bit IOSTRB memory. Itis pointed to by the INT1 
pin low after reset in the microcontroller/boot-loader mode. The boot table in 
Figure A-2 is stored in the 16-bit STRBO memory (pointed to by INTO). The 
boot table in Figure A—-3 resides in the 8-bit STRB1 memory (pointed to by 
INT2). The final example, shown in Figure A-4, represents the boot table 
stored in the host memory before being sent to the ’C32 over the serial port. 
Unlike the boot from memory, the serial port boot table omits the memory width 
control word from the beginning of the table. 


The shaded areas of the boot table examples represent the contents of the in- 
dividual blocks of code or data. The unshaded portions are the control words 
that instruct the boot loader program to transfer the blocks to the memory map. 
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Figure A-1. Boot From a 32-Bit-Wide ROM to 8-, 16-, and 32-Bit-Wide RAM 


A-2 


Source Boot Destination Block 
address table address data 
810 000 0000 0020 

810 001 1000 00F8 

810 002 2005 10F8 

810 003 3000 10F8 

810 004 6 

810 005 0000 1400 

810 006 0510 F864 Block 1 16-bit-wide external RAM 
810 007 0000 BB1D 001 400 BB1D 
810 008 0000 BB2D 001 401 BB2D 
810 009 0000 BB3D 001 402 BB3D 
810 OOA 0000 BB4D 001 403 BB4D 
810 OO0B 0000 BB5D 001 404 BB5D 
810 00C 0000 BB6D 001 405 BB6D 
810 OOD 4 

810 OOF 0081 0400 

810 OOF 0000 F860 Block 2 32-bit-wide on-chip RAM 
810 010 DDCC BB1E 810 400 DDCC BBIE 
810 O11 DDCC BB2E Rie aod DDCC BB2E 
810 012 DDCC BB3E 810 402 DDCC BB3E 
810 013 DDCC BB4E 810 403 DDCC BB4E 
810 014 6 

810 015 0088 0400 

810 016 0510 F864 Block 3 16-bit-wide external RAM 
810 017 0000 BBI1F 880 400 BB1D 
810 018 0000 BB2F 880 401 BB2D 
810 019 0000 BB3F 880 402 BB3D 
810 O1A 0000 BB4F 880 403 BB4D 
810 01B 0000 BB5F 880 404 BBSD 
810 O1Cc 0000 BB6F 880 405 BB6D 
810 01D 8 

810 O1E 0090 0400 

810 O1F 0010 F868 Block 4 8-bit-wide external RAM 
810 020 0000 0010 900 400 10 

810 021 0000 0020 900 401 20 

810 022 0000 0030 900 402 30 

810 023 0000 0040 900 403 40 

810 024 0000 0050 900 404 50 

810 025 0000 0060 900 405 60 

810 026 0000 0070 900 406 70 

810 027 0000 0080 900 407 80 

810 028 0 
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Figure A-2. Boot From a 16-Bit-Wide ROM to 8-, 16-, and 32-Bit-Wide RAM 


Source Boot Destination Block Source Boot Destination Block 
address table address data address table address data 
001 000 10 001 022 6 
001 001 00 001 023 0 
001 002 0OF8 001 024 0400 
001 003 1000 001 025 0088 
001 004 10F8 O01 026 F864 
001 005 2005 001 027 0510 | Block 3 
001 006 10F8 001 028 | BEII 880 400 | EE11 
001 007 3000 001 029 | EE22 880 401 EE22 
001 008 6 001 02A | EE33 880 402 EE33 
001 009 0 001 02B | EE44 880 403 EE44 
001 OOA 1400 001 02C EE55 880 404 EE44 
001 00B 0000 001 02D | EE66 880 405 EE55 
001 00c F864 001 02E 8 
001 OOD 0510 | Block 1 001 02F 0 
001 OOF | Aall 001 400 AALI 001 030 | 0400 
001 OOF AA22 001 401 AA22 001 031 | 0090 
001 010 AA33 001 402 AA33 001 032 F868 
001 011 | aAa4a 001 403 AA44 001 033 | 0010 | Block 4 
001 012 | AASsS5 001 404 RASS 001 034 900 400 F1 
001 013 | aAAce 001 405 BUNGIE nGd eae 900 401 F2 
001 014 4 001 036 900 402 E3 
001 015 0 001 037 900 403 E4 
001 016 0400 001 038 900 404 ES 
001 017 0081 001 039 900 405 F 6 
001 018 F860 001 03A 900 406 E7 
001 019 0000 | Block2 001 03B 900 407 E'8 
001 01A | pp11 810 400 | BBCC DD11 001 03Cc 
001 01B BBCC 810 401 BBCC DD22 001 03D 
001 O1Cc DD22 810 402 | BBCC DD33 
001 01D BBCC 810 403 | BBCC DD44 
001 O1E DD33 
001 01F | BBCC 
001 020 DD44 
001 021 | BBCC 

TMS320C32 Boot Table Examples A-3 
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Figure A-3. Boot From a Byte-Wide ROM to 8-, 16-, and 32-Bit-Wide RAM 


Source | Boot Destination | Block Source | Boot Destination Block 
address | table address data address | table address data 
900 000] og 900 028] 4 

900 001] oo 900 029] 9 

900 002] og 900 02A] 9g 

900 003) 00 900 02B}] o 

900 004] Fg 900 02C 00 

900 005] 99 900 02D} oq 

900 006] 99 900 02E| 94 

900 007] 10 900 02F/ og 

900 008] Fe 900 030] 60 

900 009] 409 900 031] pe 

900 OOA 05 900 032 00 

900 OOB] 20 900 033] 09 Block 2 

900 O0C] Fe 900 034] 11 810 400| BBCC DD11 
900 OOD] 39 900 035] DD 810 401] BBCC DD22 
900 OOF] go 900 036] cc 810 402] BBCC DD33 
900 OOF 30 900 037 BB 810 403 BBCC DD44 
900 010) «6 900 038] 22 

900 011] 9g 900 039] DD 

900 012] 9 900 O3A] cc 

900 013] o 900 03B] BB 

900 014] go 900 03c] 33 

900 015] 44 900 03D DD 

900 016] go 900 O3E| CC 

900 017] go 900 O3F| BB 

900 018] 64 900 040] 44 

900 019] Fe 900 041] DD 

900 01a] 49 900 042] cc 

900 01B| 05 Block 1 900 043] BB 

900 O1C] 41 001 400} AAI1 900 044 6 

900 O1D] aa 001 401] AA22 900 045 0 

900 O1E/ 992 001 402) AA33 900 046 0 

900 O1F] aa 001 403] AA44 900 047 0 

900 020 BR 001 404] AA55 900 048 00 

900 021] aa 001 405| AA66 900 049] o1 

900 022] aa 900 04A| 88 

900 023] aa 900 04B] 00 

900 024] 55 900 04c] 64 

900 025| an 900 04D] F8 

900 026] 66 900 04E} 10 

900 027] an 900 O4F 05 


Source Boot Destination | Block 

address | table address data 
Block 3 

900 050 Wil 880 400] AA11 

900 051 EE 880 401] AA22 

900 052 DD 880 402] AA33 

900 053 EE 880 403] aAa44 

900 054 33 880 404] AA55 

900 055 EE 880 405] AA66 

900 056 44 

900 057 EE 

900 058 55 

900 059 EE 

900 O5A 66 

900 O5B EE 

900 O5C 8 

900 O5D 0 

900 O5E 0 

900 O5F 0 

900 050 00 

900 051} 04 

900 052 90 

900 053 00 

900 054 68 

900 055 F8 

900 056 10 

900 057] 00 Block 4 

900 058] F1 900 400 Fl 

900 059} F2 900 401 F2 

900 OSA] F3 900 402 F3 

900 O5B]| F4 900 403 F4 

900 O5C} F5 900 404 FS 

900 O5D F6 900 405 F6 

900 O5E F7 900 406 in] 

900 O5SF F8 900 407 F8 

900 050] g 

900 051 0 

900 052 0 

900 053 0 
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Figure A—4. Boot From Serial Port to 8-, 16-, and 32-Bit-Wide RAM 


Source Boot Destination Block 
address table address data 
808 04C 1000 OOF8 
808 04C 2005 10F8 
808 04C 3000 10F8 
808 04C 6 
808 04c 0000 1400 
808 04C 0510 F864 Block 4 
808 04C 0000 BB1D 001 400 BB1D 
808 04C 0000 BB2D 001 401 BB2D 
808 04C 0000 BB3D 001 402 BB3D 
808 04C 0000 BB4D 001 403 BB4D 
808 04C 0000 BB5D 001 404 BB5D 
808 04C 0000 BB6éD 001 405 BB6D 
808 04C 4 
808 04C 0081 0400 
808 04C 0000 F860 Block 2 
808 04C DDCC BB1E 810 400 DDCC BBIE 
808 04C DDCC BB2E 810 401 DDCC BB2E 
808 04C DDCC BB3E 810 402 DDCC BB3E 
808 04C DDCC BB4E 810 403 DDCC BB4E 
808 04C 6 
808 04C 0088 0400 
808 04C 0510 F864 Block 3 
808 04C 0000 BBIF 880 400 BB1D 
808 04C 0000 BB2F 880 401 BB2D 
808 04C 0000 BB3F 880 402 BB3D 
808 04C 0000 BB4F 880 403 BB4D 
808 04C 0000 BBSF 880 404 BB5D 
808 04C 0000 BB6F 880 405 BB6D 
808 04C 8 
808 04C 0090 0400 
808 04C 0010 F868 Block 4 
808 04C 0000 0010 900 400 10 
808 04C 0000 0020 900 401 20 
808 04C 0000 0030 900 402 30 
808 04C 0000 0040 900 403 40 
808 04C 0000 0050 900 404 50 
808 04C 0000 0060 900 405 60 
808 04C 0000 0070 900 406 70 
808 O4C 0000 0080 900 407 80 
808 04C 0000 0000 
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TMS320C32 Boot Loader Operations 


This appendix contains the source code and boot loader opcodes for the ’C32. 
It also describes the on-chip boot loader program that initializes the DSP sys- 
tem following power up or reset. 


Topic Page 
B.1 TMS320C32 Boot Loader Source Code Description .............. B-2 
B.2 TMS320C32 Boot Loader Opcodes ............0000eeee eee eeeeee B-4 
B.3 Boot Loader Source Code Listing .............0202eeeeeeeeeeees B-6 
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B.1 TMS320C32 Boot Loader Source Code Description 
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Figure B—1 shows the boot loader program flowchart. The shaded areas re- 
present portions of code; the square shapes depict registers containing data. 
The boot loader reads the boot table from one of three memory locations 
(1000h, 810000h, 900000h) or from the serial port. The boot loader processes 
each block of the boot table separately. First, the words of the program or data 
are assembled from bytes (or half-words). The assembled words are then writ- 
ten to their destinations one at a time. Each block can be transferred to any 
memory address range within the memory map. The blocks in the boot table 
are preceded by three control words: block size, destination address, and 
strobe control register value. The boot loader ends execution when it finds a 
0 for the size of the next block. At that point, it initializes the three strobe control 
registers and branches to the first instruction of the first block. For that reason, 
the first boot table block always contains program information and not data. 
For information about the boot loader operation, see section B.3, Boot Loader 
Source Code Listing, on page B-6 and the TMS320C3x User’s Guide. 
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Figure B—1. TMS320C32 Boot Loader Program Flowchart 
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T Handshake mode is enabled by setting the IOXFO bit of IOF register to 1 when INT3 and any of INT2, INT1, or INTO signals 
are asserted following reset. 


Note: Shaded boxes indicate operations; white boxes indicate registers. 
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B.2 TMS320C32 Boot Loader Opcodes 
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Table B—1 lists the C32 boot loader opcodes (shown in boldface type). In most 
cases, an opcode is the first byte of the machine code that describes the type 
of operation and combination of operands interpreted by the central proces- 
sing unit (CPU). 


TMS320C32 Boot Loader Opcodes 


Table B—1. TMS320C32 Boot Loader Opcodes 


ADDRESS OPCODE ADDRESS OPCODE ADDRESS OPCODE ADDRESS OPCODE 
00000000 00000045 00000034 00000000 00000068 1a660001 0000009d 086800a7 
00000001 00000000 00000035 00000000 00000069 6a060004 0000009e 08650000 
00000002 00000000 00000036 00000000 0000006a O9e6fFfE 0000009£ 08620000 
00000003 00000000 00000037 00000000 0000006b O9eefffft 000000a0 o8so0acoof 
00000004 00000000 00000038 00000000 0000006c 09e50001 000000al1 08600111 
00000005 00000000 00000039 00000000 0000006d 6a00fffa 000000a2 15400743 
00000006 00000000 0000003A 00000000 0000006e 186e0002 000000a3 08670a30 
00000007 00000000 0000003B 00000000 O0000006£ 04ee0000 000000a4 09e70010 
00000008 00000000 0000003C 00000000 00000070 6a070002 000000a5 15470740 
00000009 00000000 0000003D 00000000 00000071 72000053 000000a6 6a00ffcc 
OO000000A 00000000 0000003E 00000000 00000072 6f80fffe 000000a7 1a770020 
0000000B 00000000 0000003F 00000000 00000073 70000008 000000a8 6a05fffe 
0000000C 00000000 00000040 00000000 00000074 15410704 000000a9 O2£70fdf 
0000000D 00000000 00000041 00000000 00000075 70000008 000000aa 0841074c 
0000000E 00000000 00000042 00000000 00000076 15410706 000000ab 78800000 
OOO00000F 00000000 00000043 00000000 00000077 70000008 000000ac 08630003 
00000010 00000000 00000044 00000000 00000078 15410708 000000ad 08730001 
00000011 00000000 00000045 086£4040 00000079 70000008 000000ae 09930005 
00000012 00000000 00000046 09ef0009 0000007a 08010001 000000af 18730001 
00000013 00000000 00000047 08740023 0000007b 6a060007 000000b0 080e0003 
00000014 00000000 00000048 1014000f 0000007c 08400704 000000b1 026e0001 
00000015 00000000 00000049 O871ffff 0000007d 15400760 000000b2 09ee0003 
00000016 00000000 0000004a 08000017 0000007e 08400706 000000b3 08000005 
00000017 00000000 0000004b 02e0000f O000007£ 15400764 000000b4 04e00001 
00000018 00000000 0000004c 04e00008 00000080 08400708 000000b5 6a050003 
00000019 00000000 0000004d 6a05004f 00000081 15400768 000000b6 O9e0fffL 
OO000001A 00000000 0000004e o8so0adoof 00000082 68000012 000000b7 O09eeffff 
0000001B 00000000 0000004£ 026a0060 00000083 081b0001 000000b8 6a00ff£b 
0000001C 00000000 00000050 1a600004 00000084 187b0001 000000b9 186e0001 
0000001D 00000000 00000051 536b4080 00000085 70000008 000000ba 08600000 
0000001E 00000000 00000052 6a060008 00000086 08so0d0001 000000bb 08610000 
0000001F 00000000 00000053 026a0004 00000087 4£100000 000000bc 02740003 
00000020 00000000 00000054 1a600001 00000088 5312000d 000000bd 72000007 
00000021 00000000 00000055 536b0008 00000089 53710000 000000be 18740003 
00000022 00000000 00000056 6a060004 0000008a 70000008 000000bf 21871306 
00000023 00000000 00000057 026a0004 0000008b 08040001 000000c0 09870000 
00000024 00000000 00000058 1a600004 0000008c 02e1006c 000000c1 10010007 
00000025 00000000 00000059 536b4800 0000008d 258c010f 000000c2 02000005 
00000026 00000000 0000005a 6a05ffef 0000008e O09e4fFf8 000000c3 6f£80ff£s 
00000027 00000000 0000005b 1a600008 0000008f 08030004 000000c4 78800000 
00000028 00000000 0000005c 6a050002 00000090 09e3fff0 000000c5 1a780002 
00000029 00000000 0000005d 1a780080 00000091 02e30003 000000c6 1542c200 
0000002A 00000000 0000005e 08780006 00000092 1a61000c 000000c7 6a060002 
0000002B 00000000 O000005£ 0862000f 00000093 52e30003 000000c8 08462301 
0000002C 00000000 00000060 09e20010 00000094 04e50000 000000c9 78800000 
0000002D 00000000 00000061 1042c200 00000095 52e900a7 000000ca 1b40c700 
0000002E 00000000 00000062 1542c200 00000096 536900ad 000000cb 1a780080 
0000002F 00000000 00000063 09eb0009 00000097 6400009b 000000cc 6a06fffd 
00000030 00000000 00000064 086800ac 00000098 70000009 000000cd 08462301 
00000031 00000000 00000065 08650001 00000099 1544c400 000000ce 08780002 
00000032 00000000 00000066 086e0020 0000009a 0c800000 000000cf 1a780080 
00000033 00000000 00000067 7200005d 0000009b 15412501 000000d0 6a05fffe 
0000009c 6a00ffdc 000000d1 08780006 

000000d2 78800000 
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B.3 Boot Loader Source Code Listing 
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C32BOOT - TMS320C32 BOOT LOADER PROGRAM (143 words) March-96 
« (C) COPYRIGHT TEXAS INSTRUMENTS INCORPORATED, 1994 v.27 
* * 
* 
* NOTE: 
* 
* 1. Following device reset, the program waits for an external 
7 interrupt. The interrupt type determines the initial address 
i from which the boot loader starts loading the boot table to the 
* destination memory: 
* 
* INTERRUPT PIN BOOT TABLE START ADDRESS BOOT SOURCE 
: INTRO 1000h (STRBO P_PORT 
i INTR1 810000h (IOSTRB) P_PORT 
* 
* INTR2 900000h (STRB1) P_PORT 
* 
us INTR3 80804Ch (sport0O Rx) SERIAL 
. INTRO and INT3 1000h (STRBO) ASYNC PPORT, XFO/XF1 
i INTR1 and INT3]| 810000h (IOSTRB) ASYNC PPORT, XFO/XF1 
* 
* INTR2 and INT3 900000h (STRB1) ASYNC PPORT, XFO/XF1 


No 


Ww 


+ + £ + + F F FF F FF F FF FF F F F F FF F KF OF 


= 
fo>) 


If INT3 is asserted together with INT2, or INT1, or INTO following 
reset, that indicates that the boot table is to be read 
asynchronously from EPROM using pins XFO and XF1 for handshaking. 
The handshaking protocol assumes that the data ready signal 
generated by the host arrives through pin XF1l. The data 
acknowledge signal is output from the C32 on pin XFO. Both 

signals are active low. The C32 continuously toggles the IACK 
signal while waiting for the host to assert data ready signal 

(pin XF1). 


The boot operation involves transfer of one or more source 
blocks from the boot media to the destination memory. The block 
structure of the boot table serves the purpose of distributing 
the source data/program among different memory spaces. Each 
block is preceded by several 32-bit control words describing 
the block contents to the boot loader program. 


When loading from the serial port, the boot loader reads the source 
data/program and writes it to the destination memory. There is 

only one way to read the serial port. When loading from EPROM, 
however, there are 4 ways to read and assemble the 

source contents, depending on the width of boot memory and the 


+ + + + + F F FF FF F F FF HF F 


+ + FF + + + FF FF FF FF F F F FF FF FF FF F F FF FF FF F F FF F KF OF 
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size of the program/data being transferred. Because there is a 
possibility that reads and writes can span the same STRB space, 
the boot loader loads the appropriate STRB control registers 
before each read and write. 


If the boot source is an EPROM whose physical width is less than 
32 bits, the physical interface of the EPROM device(s) to the 
processor must be the same as that of the 32-bit interface. 
(This involves a specific connection to the C32’s strobe and 
address signals). The reason for such an arrangement is that 

to function properly, the boot loader program always expects 
32-bit data from 32-bit wide memory during the boot load 
operation. Valid boot EPROM widths are : 1, 2, 4, 8, 16 

and 32 bits. 


A single source block cannot cross STRB boundaries. For 
example, its destination cannot overlap STRBO space and IOSTRB 
space. Additionally, all of the destination addresses of a 
single source block must reside in physical memory of the 

same width. It is not permitted to mix program and data in the 
same source block. 


The boot loader stops boot operation when it finds a 0 in the 
block size control word. Therefore, each boot table must 

end with a 0, prompting the boot loader to branch to the 
first address of the first block and start program execution 
from that location. 


'C32 boot loader program register assignments, and altered memory 


locations 
* 
AR7 peripheral memory map IOF - XFO (handshake - data acknowledge) 
ARO - read cntrl data subr pointer IOF - XF1 (handshake - data ready) 
AR1 - read block data/prg subr pointer 
R2 -— read STRB value R4 - write STRB value 
AR2 - read STRB pointer AR4 - write STRB pointer 
AR3 - read data/prg pointer AR5 -— write data/prg pointer 
read > R1 > write 
IRO - EXEC start flag stack - 808024h - TIMO cnt reg 
TR1 -— EXEC start address 808028h —- TIMO per reg 
IOSTRB —- 808004h — DMAO dst reg 
R3 - data size STRBO - 808006h — DMAO dst reg 
R5 — mem width STRB1 - 808008h — DMAO cnt reg 
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* 
* R6 - memory read value AR6,R7,RO0O,BK -— scratch registers 
* 
* 
reset -word start ;, reset vector 
. space 44h ; program starts @45h 


* Initialize registers 808000h --> AR7, 808023h --> SP, -1 --> IRO 
* * 
start LDI 4040h, AR7 ; load peripheral memory map 
LSH 9,AR7 ; base address = 808000h 
LDI 23h, SP ; initialize stack pointer to 
OR AR7, SP ; 808023h (timer counter - 1) 
LDI -1,IR0 ; reset exec start addr flag 
* * 
* Test for INT3 and, if set exclusively, proceed with serial 
* boot load. Else, load AR3 with 1000h if INTO, 810000h if INTI1, 
* 900000h if INT2. Also load the appropriate boot strobe pointer --> AR2 
* and force the boot strobe value to reflect 32-bit memory width. 
* If (INTO or INT1 or INT2) and INT3, turn on the handshake mode. 
*x *x 
waitl LDI IF,RO 
AND OFh, RO ; clean 
CMPI 8,RO ; test for INT3 
BEQ serial pxxxxk*K*Ks serial boot load mode 
LDI AR7,AR2 
ADDI 60h, AR2 ; 808060h (IOSTRB) --> AR2 
TSTB 2,R0 ; test for INT1 
LDINZ 4080h, AR3 ; 810000h / 2**9 
BNZ exit3 Seen Ra 
ADDI 4,AR2 ; 808064h (STRBO) --> AR2 
TSTB 1,R0 ; test for INTO 
LDINZ 8, AR3 ; 001000h / 2**9 
BNZ exit3 PREREARKS 
ADDI 4,AR2 ; 808068h (STRB1) --> AR2 
TSTB 4,R0 ; test for INT2 
LDINZ 4800h, AR3 ; 900000h / 2**9 
BZ waitl pK RR KH 
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exit3 TSTB 8,RO ;*; test#l - INT3 asserted 
BZ exit2 ;*; test#2 - INXF1 low (not used) 
TSTB 80h, IOF ;*; enable handshake mode if 
LDI 6, 1OF ;*; test#l passed 
exit2 LDI OFh, R2 
LSH 16,R2 ; force boot data size to 32 
OR *AR2,R2 ; force boot mem width to 32 
SLI R2, *AR2 
LSH 9,AR3 ; boot mem start addr --> AR3 
* xx000001 - 1 bit 
* xx000010 - 2 bit 
* Process MEMORY WIDTH control word (32 bits long) xx000100 - 4 bit 
* xx001000 - 8 bit 
* xx010000 - 16 bit 
* xx100000 - 32 bit 
LDI read_mc, ARO 7 use memory to read cntrl words 
: read_mc --> ARO 
LDI 1,R5 ; mem width = 1 (init) 
LDI 32,AR6 ; mem reads = 32 (init) 
CALLU read_m 7 read memory once (lst read) 
loop2 TSTB 1,R6 
BNZ label4 
LSH -1,R6 ; look at next bit 
LSH —-1,AR6 ; decr mem reads 
LSH 1,R5 ; incr mem width --> R5 
BU loop2 pK RR KKK 
label4 SUBI 2,AR6 
CMPI 0,AR6 ; set flags 
BN strobes ;*******> total # of mem reads = 32/R5 
label5 CALLU read_m ; read memory once 
DBU AR6,label5 ;***x; 
* * 
* Read and save IOSTRB, STRBO & STRB1 (to be loaded at end of 
* boot load) 
a * 
strobes CALLU ARO 
STI R1, *+AR7 (4) ; ILOSTRB ==> (DMA src) 
CALLU ARO 
STI R1, *+AR7 (6) ; STRBO --> (DMA dst) 
CALLU ARO 
STI R1, *+AR7 (8) ; STRB1 --> (DMA cnt) 
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* Process block size (# of bytes, half- 


* -<CGner 1) 


words, or words after STRB 


* 


block CALLU 


LDI 
BNZ 


label2 L 


L 
S 
LDI 
S 
L 


DI 
TI 


TI 
DI 


ARO 7 
R1,R1 . 
label2 Pat Sa a 


*+AR7(4),RO ; 
RO, *+AR7 (60h) ; 
*+BR7(6),RO ; 
RO, *+AR7 (64h) ; 
*+BR7(8),RO ; 
RO, *+AR7 (68h) ; 


TR1 RR RRR Ks 


R1,RC 7 
1,RC 7 


read boot memory cntrl word 
is this the last block ? 
no, go around 


(DMA src) 
restore IOSTRB 

(DMA dst) 
restore STRBO 

(DMA cnt) 
restore STRB1 


branch to start of program 


setup transfer loop 
RC - 1 --> RC 


* * 
* Process block destination address, save start address of first 
* block 
* * 
CALLU ARO ; read boot memory cntrl word 
LDI R1,AR5 ; set dest addr --> AR5 
CMPI 0, IRO 7 look at EXEC start addr flag 
LDINZ AR5, IR1 ( 2f =l; EMEC Start addr <=> IRL 
LDINZ 0, IRO ; set EXEC start addr flag 
* * 
* (For internal destination, this word must be O or 60h. The first 
* case results in 0 --> DMA control register, in second case 0 --> 
* IOSTRB register). 
* Process block destination strobe control (sss...sss 0110 xx00) 
* strb value ==== 00 - IOSTRB 
x 01 - STRBO 
CALLU ARO H 10 - STRB1 
LDI R1,R4 
AND 6Ch, R1 ; dest mem strb pntr --> AR4 
OR3 AR7,R1,AR4 
LSH -8,R4 ; dest memory strobe --> R4 
LDI R4,R3 
LSH -16,R3 
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AND 3y:R3 
TSTB OCh, R1 
LDIZ SRS 
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; dest data size --> R3 
; (IOSTRB case) 


* 


* Look at R5 and choose serial or memory read for block data/program 
* 


* 


CMP I 
LDIEO 
LDINE 


0,R5 
read_s0,AR1 
read_mb, AR1 


; read serial port0O 
; read memory 


Transfer one block of data or program 


RPTB 
CALLU 

STI 

NOP 

loop4 STI 
BU 


* 


loop4 
AR1 
R4, *AR4 


R1, *AR5++ 
block 


; read data/prg 
; set write strobe 
; pipeline 


7 *ekeeRKes Drocess next block 


* Load R5 with O, 


* 


load read_s0O 


to ARO and initialize serial port_0 


serial LDI read_s0, ARO ; use serial to read cntrl words 
LDI 0,R5 7 memory WIDTH = serial 
LDI 0,R ; dummy 
LDI AR7,AR2 ; dummy 
LDI 111h, RO 7 0000111h -—-> RO 
STI RO, *+AR7 (43h) ; set CLKR,DR,FSR as serial 
LDI OA30h,R7 ; port pins 
LSH 16,R7 ; A300000h -—-> R7 
STI R7, *+AR7 (40h) ; set serial global cntrl reg 
BU strobes z***x*kx*? process first block 
* * 
* Read a single value from serial or boot memory. The number of 
* memory reads depends on memory width and data size. Rl returns the 
* read value. (Serial sim: NOP --> BZ read_sO & LDI @4000H,R1 --> LDI 
*  *+AR7(4Ch),R1) 
* * 


read_sO TSTB 


20h, IF 


; look at RINTO flag 
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BZ read_s0 ; wait for receive buffer full 
AND OFDFh, IF ; reset interrupt flag 
LDI *+AR7(4Ch) ,R1 ; read data --> RI 
RETSU 
* 
read_mc LDI 3,R3 ; data size = 32, 3 --> R3 
read_mb LDI 1,BK ; 00000001 (ex: mem width=8) 
LSH R5,BK ; 00000100 
SUBI 1,BK ; OOOOOOFF = mask -—-> BK 
LDI R3, AR6 ; 0. = 1 000 EXPAND 
ADDI 1,AR6 ; d= 10 000 DATA --> AR6 
LSH 3,AR6 ; 11 - 100 000 SIZE 
LDI R5,RO 
loop3 CMPI 1,R0 
BEQ exitl ; DATA SIZE 
LSH -1,R0 = SS Se aS al -—-> AR6 
LSH -1,AR6 ; MEM WIDTH 
BU loop3 pK KH 
exitl SUBI 1,AR6 
LDI 0,RO ; init shift value 
LDI 0,R1 ; init accumulator 
loopl ADDI 3,SP ; 808027h --> SP 
CALLU read_m ; read memory once --> R6 
SUBI 3,SP ; 808024h --> SP 
AND3 R6,BK,R7 ; apply mask 
LSH RO, R7 P SHITE 
OR R7,R1 ; accumulate --> RI 
ADDI R5,RO ; increment shift value 
DBU AR6, loop1l ;*****, decrement #of chunks --> AR6 
RETSU 


comm-port) 


+ + F F F F 


Perform a single memory read from the source boot table. 

Handshake enabled if IOXFO bit of IOF reg is set, disabled when 
reset. IACK will pulse continuously if handshake enabled and data 
not ready (to achieve zero-glue interface when connecting to a C40 


read_m TSTB 


2, 1OF ; handshake mode enabled ? 

R2, *AR2 ; set read strobe !!!!!]titrtrel 
loop5 ; yes, jump over 

*AR3++,R6 eNO, just read memory & return 
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(C40) 


loop5 


loop6é 


RETSU 


*AR7T 
80h, IOF 
loop5 
*AR3++,R6 
2, 10F 
80h, IOF 


loop6é 


6, IOF 
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intrnl dummy read pulses IACK 
wait for data ready 
(XF1 low from host) 


read memory once --> R6 


assert data acknowledge 
(XFO low to host) 


wait for data not ready 
(XF1 high from host) 


deassert data acknowledge 
(XFO high to host) 
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This appendix describes the two memory models that can be used to access 
data when programming in C. 


Two memory models can be used to access data when programming in C. In 
the small model (default), the external bus cycles use direct addressing to ac- 
cess data from memory. Direct addressing uses 16 bits of address in the 
instruction opcode. The address is combined with the 8-bit data page (defined 
beforehand) to access the data from memory. The 16-bit address limits the 
number of words that the small model can access to 64K words. However, this 
mode produces fast and compact code because each data access uses only 
a single instruction (see Figure C—1). 


The big model is not limited to 64K words because each data access in C ex- 
plicitly sets the data page pointer (DP register). The 8-bit data page and 16-bit 
direct address are combined for a total address reach of 16M words, but at a 
price of two instructions per data access (see Figure C—1). 


Dynamically allocated memory can be used if the application needs a large 
address reach, compact code size, and fast execution. The MALLOC function 
from the runtime support library (RTS) can be called at run time to reserve a 
block of memory in the .SYSMEM section. Upon return, MALLOC returns a 
pointer to the newly allocated block. Any reference to that block of memory 
results in assembled code using indirect addressing, in which the opcode 
contains apointer to the auxiliary register that holds the address of the operand 
(see Figure C—1). Code referring to the dynamically allocated memory is fast 
and has a 16M-word address reach (24 bits). The price is a one-time call to 
MALLOC for each dynamically allocated array. For that reason, MALLOC is 
most efficient with large data arrays where the overhead associated with the 
call is insignificant when compared to a large number of data accesses that 
use the big arrays. 
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Figure C—1. Memory Allocation in C Programs 


(a) Small model (default) 


¢ Static memory — assigned at compile time 
e Maximum size — 64K words 
e Fast execution 


TMS320C32 Memory 


STRB | 


C statement 


Equivalent assembly code 


.bss (small) 


C=A+B 


LDI| @OFFFDh, RO 
LDI| @OFFFEh, R1 
ADDI RO, R1 

STI R1,@OFFFh 


text 


(b) Big model (-mb option) 


¢ Static memory — assigned at compile time 
e Maximum size — 64M words 
¢ Slow execution 


TMS320C32 Memory 


STRB | .bss (big) 


C statement 


Equivalent assembly code 


text 


(c) RTS library (MALLOC) 


e Dynamic memory — assigned at execution time 


e Maximum size — 64M words 
e Fast execution 


¢ Best for big arrays (one time overhead — MALLOC call) 


C=A+B 


LDP @ 880001h, DP 
LDI @880001h, RO 
LDP @ 1002h, DP 
LDI @1002h, R1 
LDP @ 8A0003, DP 
STI R1, @ 8A0003 


C statement 


Equivalent assembly code 


C=A+B 


LDI *ARO, RO 
LDI *AR1, R1 
ADDI RO, R1 

STI R1, *AR2 


TMS320C32 Memory 
STRB > -sysmem 
text 
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Figure C—2 shows how to use MALLOC to allocate a block of 32-bit memory 
at run time. In this example, MALLOC is called three times to allocate memory 
from the heap. 


After each MALLOC call, the newly allocated block of memory can be used by 
other program functions by using the pointer BUFFER_32. The size of the 
heap (representing all of dynamically allocated memory) is defined in the linker 
command file by using the HEAP keyword followed by the size of the block. 
Any portion of the heap allocated with the MALLOC call is added to the 
.SYSMEM section. The SECTIONS directive can then be used to map the 
dynamically allocated sections to an address range in the physical memory. 
(For more information, see the TMS320C3x/C4x Assembly Language Tools 
User’s Guide or TMS320C3x/C4x Optimizing C Compiler User’s Guide .) 


Dynamically allocated memory provides the only method for a C program to 
access 8- or 16-bit wide memory. This means that physical memory thatis less 
than 32 bits wide cannot be accessed using small or big model addressing. 
Instead, the MALLOC8 and MALLOC16 RTS library functions can allocate 
blocks of 8- and 16-bit wide memory. These routines work like the 32-bit 
MALLOC by returning pointers to 8- or 16-bit memory blocks. These can be 
used by code that follows the MALLOC call to access that memory (see 
Figure C-3 and Figure C—4). The 8-bit data allocated by MALLOC8 is placed 
in the .SYSM8 section by the linker, while the 16-bit data is deposited in the 
.SYSM16 section. HEAP8 and HEAP 16 linker keywords limit the total amount 
of 8- or 16-bit memory that the C compiler can allocate into those sections. (For 
more information, see the TMS320C3x/C4x Optimizing C Compiler User’s 
Guide .) 
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Figure C—2. Dynamic Memory Allocation for TMS320C32 (One Block of 32-Bit Memory) 


(a) C code 
e 
int *BUFFER_32 /* declare a pointer to a pool of 32-bit memory */ 
e 
e 
e 
e 
e 
e 
BUFFER_32 = MALLOC (2048 * sizeof (int) ) /* allocate 2K words of memory */ 
dsp_func4 ( BUFFER_32) /* use the above memory */ 
e 
e 
e 
BUFFER_32 = MALLOC (512 * sizeof (int) ) /* allocate 0.5K words of memory */ 
dsp_func5 ( BUFFER_32) /* use the above memory */ 
e 
e 
e 
BUFFER_32 = MALLOC (1024 * sizeof (int) ) /* allocate 1K words of memory */ 
dsp_func6 (BUFFER_32) /* use the above memory */ 
e 
e 
e 
(b) LINKER command file 
e 
e 
e 
-heap 0x4000 /* set the size of the dynamic 32-bit memory section */ 
e 
e 
e 
STRB_RAM org = 0x1000, len = 0x8000 /* define physical 32-bit memory */ 
e 
e 
e 
-sysmem > STRB_RAM /* assign logical section to physical memory */ 
e 
e 
oe 


TMS320C32 
C31 
C30 
SSysinnetaa) 32-bit wide 
STRB |K—— memory 
.bss 
text 


C-4 


Memory Access for C Programs 


Figure C—3. Dynamic Memory Allocation for TMS320C32 (One Block of 16-Bit Memory) 


(a) C code 
e 
e 
int *BUFFER_16 /* declare a pointer to a pool of 16-bit memory */ 
e 
e 
*0x808064 = 0x5000 /* STRBO control register data size = 16, memory width = 16 */ 
eo 
e 
BUFFER_16 = MALLOC16(1024 * sizeof (int) ) /* allocate 2K half-words of memory */ 
dsp_func4 ( BUFFER_16) /* use the above memory */ 
e 
oe 
e 
BUFFER_16 = MALLOC16 (512 * sizeof (int) ) /* allocate 1K half-words of memory */ 
dsp_func5 ( BUFFER_16) /* use the above memory */ 
e 
eo 
e 
BUFFER_16 = MALLOC8 (2048 * sizeof (int) ) /* allocate 4K half-words of memory */ 
dsp_func6 (BUFFER_16) /* use the above memory */ 
e 
e 
e 
(b) LINKER command file 
e 
e 
e 
-heap 16 0x4000 /* set the size of the dynamic 16-bit memory section */ 
e 
oe 
e 
STRBO_RAM org = 0x880000, len = 0x8000 /* define physical 16-bit memory */ 
e 
e 
-sysm16 > STRBO_RAM /* assign logical section to physical memory */ 
oe 
e 


(c) C32 external memory contents 


TMS320C032 
STRBO -sysm16 16-bit wide memory 
IOSTRB bes 
STRB1 fe 


32-bit wide memory 
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Figure C—4. Dynamic Memory Allocation 
and 8-Bit Memory) 


for TMS320C32 (One Block Each of 32-, 16-, 


(a) C code 
e 
int *BUFFER_32 /* declare a pointer to a pool of 32-bit memory */ 
int *BUFFER_16 /* declare a pointer to a pool of 16-bit memory */ 
int *BUFFER_08 /* declare a pointer to a pool of 8-bit memory */ 
e 
e 
*0x808064 = 0x5000 /* STRBO control register data size = 16, memory width = 16 */ 
*0x808068 = 0x0000 /* STRB1 control register data size = 8 , memory width = 8 */ 
e 
e 
BUFFER_32 = MALLOC (1024 * sizeof (int) ) /* allocate 1K words of memory */ 
BUFFER_16 = MALLOC16(1024 * sizeof (int) ) /* allocate 2K halfwords of memory */ 
BUFFER_08 = MALLOC8 (1024 * sizeof (int)) /* allocate 4K bytes of memory */ 
dsp_funcl (BUFFER_32, BUFFER_16, BUFFER_08) /* use the above memory */ 
e 
e 
BUFFER_32 = MALLOC (2048 * sizeof (int) ) /* allocate 2K words of memory */ 
BUFFER_16 = MALLOC16 (512 * sizeof (int) ) /* allocate 1K half-words of memory */ 
dsp_func2 (BUFFER_32, BUFFER_16) /* use the above memory */ 
e 
e 
BUFFER_08 = MALLOC8 (4096 * sizeof (int)) /* allocate 16K bytes of memory */ 
dsp_func3 (BUFFER_08) /* use the above memory */ 
e 
e 
(b) LINKER command file 
e 
e 
-heap 0x4000 /* set the size of the dynamic 32-bit memory section */ 
-heap 16 0x4000 /* set the size of the dynamic 16-bit memory section */ 
-heap 8 0x4000 /* set the size of the dynamic 8-bit memory section */ 
e 
e 
IOSTRB_RAM org = 0x810000, len = 0x8000 /* define physical 32-bit memory */ 
STRBO_RAM org = 0x880000, len = 0x8000 /* define physical 16-bit memory */ 
STRB1_RAM org = 0x900000, len = 0x8000 /* define physical 8-bit memory */ 
e 
e 
“Sysnen s TOSTRE: RAM /* assign logical section to physical memory */ 
.sysm1l6 > STRBO_RAM /* assign logical section to physical memory */ 
-sysm8 a STRB1_RAM /* assign logical section to physical memory */ 
(c) ‘C32 external memory contents 
TMS320C32 
32-bit wide memory STRBO -sysm16 16-bit wide memory 
.sysmem IOSTRB 
STRB1 .sysm8 | 8-bit wide memory 
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Memory Interface and Address Translation 


This appendix describes how to use the 'C32’s memory interfaces to connect 
to various external devices. 


The ’C32 memory interface supports variable-width memory and variable-size 
data. The physical width of a memory bank connected to the ’C32 can be 8, 
16, or 32 bits wide. When connecting 16-bit external memory, the A_; address 
pin must be connected to the Ag pin of the memory device, causing a 1-bit shift 
in the connection of the remaining address lines. For 8-bit memory, two extra 
address pins are used (A_; and A_»), effectively shifting the external address 
by two bits. No external address shift is needed for connecting 32-bit wide 
memory (or boot table memory, regardless of its width). 


The ’C32 can access data of any size, regardless of the physical width of an 
external memory bank. For example, byte-wide data can be packed in 16-bit 
memory, or 32-bit data can be accessed from 8-bit wide memory. The latter 
takes four cycles. The variable-data size feature is made possible by dividing 
the STRBO or STRB1 controls into four signals each. The four control signals, 
in addition to being strobes, serve a byte-enable function. 


Figure D-1 shows examples of three ‘C32 systems, each connected to a 
memory bank of a different width. 


Regardless of memory width, the data inside each bank can be 8, 16, or 32 
bits wide. Before data of a particular size can be accessed, the respective 
strobe control register must be programmed for that size. While the data size 
can vary, the program is always 32 bits wide. Even if they are different sizes, 
program and data can reside within the same physical bank of memory. 


Up to two data sizes can reside simultaneously alongside the 32-bit program 
in a single bank (see Figure D—2 on page D-3). 
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Figure D-1. Data and Program Packing (Program and a Single Data Size) 


32-bit memory 


32-bit program 


TMS320032 


16-bit data 


16-bit data 


Ce He He HH: 


32-bit wide data bus 


16-bit memory 


TMS320C32 


32-bit program 


8-bit data 8-bit data 


ar’ 


« te 


16-bit wide data bus 


8-bit memory 


TMS320C32 


16-bit data 


a 


ime 


8-bit wide data bus 


NOTE: 8-bit programs are not supported. 
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32-bit memory 


Figure D-2. Data and Program Packing (Program and Two Different Data Sizes) 


8-bit data 


8-bit data 


8-bit data 


8-bit data 


32-bit p 


rogram 


16-bit data 


16-bit data 


32-bit wide data bus 


TMS320C32 


16-bit memory 


16-bit or 32-bit data 


32-bit program 


Whe 


8-bit data 


8-bit data 


{Ts 


iT, 


16-bit wide data bus 


8-bit memory 
8-bit data 
TMS320C32 
16-bit data 
He 


8-bit wide data bus 


NOTE: 8-bit programs are not supported. 
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Memory Interface and Address Translation 


Since there are two strobes that support flexible memory (STRBO and 
STRB1), they each can be programmed for a different data size using the re- 
spective strobe control registers. By setting the strobe configuration bit in one 
control register, both STRBO and STRB1 strobes can be mapped to STRBO 
control signals. This creates a section of physical memory that is mapped into 
the same address range as another section of memory with a hardware switch 
to determine which range is active. In this overlay mode, data accesses to and 
from the STRBO and STRB1 portions of the memory map drive the STRBO sig- 
nals to control a single memory bank. The access to the program and to two 
different data sizes from a single memory bank with no additional logic devices 
is a powerful ’'C32 feature that minimizes system cost with no performance 
penalty. See the 7M@S320C3x User’s Guide for more information on the ’C32 
enhanced external memory interface. 


The translation starts when an instruction requests a data read from a certain 
external address. Address locations referenced by program instructions are 
logical addresses. Before the logical address shows up on the external pins 
of the ’C32, it may undergo a 1- or 2-bit shift to the right that depends only on 
the size of the data being accessed. The address at the pins is a physical 
address. Before it is presented at the pins of the memory device, the physical 
address may again be shifted (this time to the left) if the memory is other than 
32 bits wide. The physical-to-memory address shift is one bit for 16-bit wide 
memory and two bits for 32-bit memory. The Table D—1 and Table D—2 sum- 
marize the rules that apply to the variable data size and memory width for any 
C32 system. 


Table D-1. Variable Memory Width 


D-4 


Physical Address to 


Memory Physical Address Memory Address Shift 
Width Strobes Valid Lines Valid (bits) 
32 STRBx_B3 A23-A0 0 

STRBx_B2 
STRBx_B1 
STRBx_BO 
16 STRBx_B1 A23-—A0 1 
STRBx_BO A-1 
8 STRBx_BO A23-A0 2 
A-1 
A-2 


Memory Interface and Address Translation 


Table D-2. Variable Data Size 
Logical to Physical 


Data Size Address Shift (bits) 
32 0 
16 1 
8 2 


Figure D-3 through Figure D—11 show how the address changes when acces- 
sing data of varying size from memory that is 32, 16, and 8 bits wide. The three 
data sizes and three memory widths comprise the nine cases that cover all 


possible combinations. 
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Figure D-3. Address Translation for 32-Bit Data Stored in 32-Bit-Wide Memory 
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STRB Memory Data oo jo jo 
STREO config width size = E CS 5 
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\ w2 th w2 th 
\ w3 Dh Logical address (23 to 0) w3 2h 
’ a= 4} 1f4]4}4} 4} 4} 4] 4) 4] 4] 414) a 
w4 3h -»[0]0/ 0/0] 0] 0] of 0] o RMI RID ENR) RN) Iie) eae 
IOSTRB| \ ° ° 
\ e e 
 —== fo} olojofojo] ol ORR IRRRRRMR RG || —- 
\ w32765 7FFCh Bivetealada ee w32765 7FFCh 
[| sical address 0-7 aT 
\ w32766 7FFDh os ( ) w32766 7FFDh 
STHEO ‘ w32767 Prey | oe eee) 4) 4) af] 4] 4] 4) 4) 4] 4] 4] 4) a] a] =F w32767_—-| 7FFEh 
STRB1 \ w32768 7FFFh + —> w32768 7FFFh 


Memory address space 


32-bit data bus 


bogie! ls q Data A14 A14_ Data A14_ Data A14. Data 
(32-bit data size) Ry oS ae > re > 
= re} e rs re) e ro) 
a e Cy [S e = 
Chee A2 fo = Ao = 
o= Al Al Al 
q AO AO GS AO CS 
gq STRBO_B3 
ra STRBO_B2 
ra STRBO_B1 
q STRBO_BO 
oooog 


Note: The amount of shift between logical and physical addresses depends only on the size of data being transferred. 
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Figure D-4. Address Translation for 16-Bit Data Stored in 32-Bit-Wide Memory 
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STRBO STRBO 32 bits 16 bits nal 
\ STRB enable| t] hws 
\ Logical address (23 to 0) hw5 
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IOSTRB| —\ e 
\ » ° 
\ 4] 4fa}4]4]4f a) a] a] 4} 4] 4] af a) 
\ hw65533 [0|0] o/0/ 0/0] of of o| ae hw65530 | hw65529 
i hw65534 Physical address (14 to 0) hw65532 | hw65531 
STRBO hw65535 Memory address (14 to 0) 7_—— hw65534 | hw65533 
STRBI hw65536 eye Kiomme §=hw65535 


Memory address space 


32-bit data bus 


Logical address Data A14 A14 Data A14 Data A14 Data A14_ Data 
SL lS do. A13 A13 A13 A13 A13 

(16-bit data size) ro) e e e = e > e = 
= Q e e rs) e fo) e fo) e cS) 
ql e e 5 e 5 e 5 e 5 
mes) A2 A2 = A2 = A2 = A2 = 
oz A AL, Al Ale Oy I 
q AO AO CS AO cS AO cs AO cS 

q STRBO_B3 © 

q STRBO_B2 © 

q STRBO_Bi O 

q STRBO_BO 8 

oDoood 


Note: The amount of shift between logical and physical addresses depends only on the size of data being transferred. 


Oh 
th 
2h 
3h 


7FFCh 
7FFDh 
7FFEh 
7FFFh 
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Figure D-5. Address Translation for 8-Bit Data Stored in 32-Bit-Wide Memory 


CPU instruction: STI RO, @ FFFh; DP = 01 


Memory map ae 
sat | bt | 
b1 Oh 
. i 
\ on 
: sh 
IOSTRB| \ 
\ 
\ 
\ 6131069] 1FFFCh 
\ 1b131070] 1FFFDh 
STRBO " B131071] 1FFFEh 
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Note: 


Logical address 
shift = 1 bit 
(8-bit data size) 
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e e e 
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e e e 
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b131064]b131063]b131062|b131061| 7FFDh 
b131068]b131067]b131066|b131065| 7FFEh 
'—P PE b131071/b131070]b131069) 7FFFh 


TMS320032 
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AO 


32-bit data bus 


Memory address space 


STRBO_B3 


STRBO_B2 © 


STRBO_B1 


nononoonoOoOon 


STRBO_BO A 
ooooa 


The amount of shift between logical and physical addresses depends only on the size of data being transferred. 
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Figure D-6. Address Translation for 32-Bit Data Stored in 16-Bit-Wide Memory 


- fo} 
CPU instruction: STI RO, @ 3FFFh; DP = 88h = a 
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STRBO config width size n o 
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Logical address 
shift = 0 bits 
(32-bit data size) 
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7FFDh 
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Memory address space 


Data A14 A14_ Data —1] A14_ Data Physical address 
xt A13 A13 — A13 shift = 1 bit 
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at é fe 
e A2 —c A2 
2 ao i  — 
= a ae — — 
AO AO GSH —cAO CS 
Ad a 
STRBO_B2 
STRBO_B1 
STRBO_BO 


Notes: 1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 


2) The amount of shift in the physical 


connection between the ’C32 and the external memory depends only on the width of the memory bank. 
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Figure D-7. Address Translation for 16-Bit Data Stored in 16-Bit-Wide Memory 


CPU instruction: STIRO,@ 7FFFh; DP = 88h 


Memory map cr ea 
SIRE? hwi | 880000h 
/ hw2 | 880001h 
/ hw3 | 880002h 
f hw4 | 880003h 
IOSTRB / e 
/ ° 
/ 
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Logical address 
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(16-bit data size) 
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x md e ep de E 
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Memory address space 


Physical address 
shift = 1 bit 
(16-bit memory width) 


Notes: 1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 


2) The amount of shift in the physical 


connection between the ’C32 and the external memory depends only on the width of the memory bank. 
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Figure D-8. Address Translation for 8-Bit Data Stored in 16-Bit-Wide Memory 


CPU instruction: STI RO, @ OFFFh; DP = 90h 


Memory map Logical address 
—______ space 
ied b1 880000h 
Fi b2 880001h 
/ b3 880002h 
j b4 880003h 
IOSTRB / e 
/ e 
/ td 
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Logical address 
shift = 2 bits 
(8-bit data size) 
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7FFFh 


Memory address space 


Physical address 
shift = 1 bit 


Notes: 1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 
connection between the ’C32 and the external memory depends only on the width of the memory bank. 


2) The amount of shift in the physical 
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Figure D-9. Address Translation for 32-Bit Data Stored in 8-Bit-Wide Memory 


jo) 
a 
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Memory Data oc 
STRB1 width size 5 
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Memory address space 


8-bit data bus 


Logical address 
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(32-bit data size) 


Physical address shift = 2 bits 
(8-bit memory width) 
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STRBO_BT 
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Notes: 1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 
2) The amount of shift in the physical connection between the ’C32 and the external memory depends only on the width of the memory bank. 
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Figure D-10. Address Translation for 16-Bit Data Stored in 8-Bit-Wide Memory 


CPU instruction: STI RO, @ 3FFFh; DP = 90h 
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Memory address space 


Physical address shift = 2 bits 


(8-bit m 


Notes: 1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 


2) The amount of shift in the physical connection between the ’C32 and the external memory depends only on the width of the memory bank. 


emory width) 
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Figure D-11. Address Translation for 8-Bit Data Stored in 8-Bit-Wide Memory 


CPU instruction: STI RO, @ 7FFFh; DP =90h 
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Memory address space 


Physical address shift = 2 bits 
(8-bit memory width) 


Notes: 1) The amount of shift between logical and physical addresses depends only on the size of data being transferred. 


2) The amount of shift in the physical connection between the ’C32 and the external memory depends only on the width of the memory bank. 


uolejsuel] SSeJppy pue eoepeju; AIOWAyy 


12-pin connector, dimensions 10-9 
16/8-bit memory configuration design 
examples 4-41 
data size equals memory width 4-43 
data size is greater than memory width 4-45 
data size is less than memory width 4-47 
16-bit dynamic memory allocation 4-84 
32-bit memory configuration design examples 4-35 
data size equals memory width 4-35 
data size is less than memory width 4-38 
8-bit static memory allocation 4-78 


A-law 
compression 6-5 
expansion 6-6 
adaptive filters 6-15 
addition example, extended-precision 
arithmetic 3-16 
address space segmentation 4-12 
AIC initialization 
AlC reset 8-21 
C31 timer initializing 8-22 
initializing AIC 8-24 
primary communications 8-25 
data format 8-25 
mode selection 8-25 
secondary communications 8-25 
control register bit fields 8-26 
data format 8-26 
serial port initializing 8-23 
algorithm partitioning, to determine power supply 
requirement 12-4 
algorithms, DSP 6-1 to 6-102 
analog-to-digital converters (ADC), interface to the 
*C30 expansion bus 8-2 to 8-5 


Index 


ANDing of the ready signals 4-11 


application-oriented operations 
adaptive filters 6-15 
companding 6-2 to 6-6 
fast Fourier transforms (FFT) 6-28 
FIR filters 6-7 
IIR filters 6-9 
lattice filters 6-18 
matrix-vector multiplication 6-24 
arithmetic operations 
bit manipulation 3-2 
bit-reversed addressing 3-5 
block moves 3-4 
extended-precision arithmetic 3-16 
floating-point format conversion 3-20 
integer and floating-point division 3-6 
square root 3-13 
assembler/linker 11-2 
assembly language instructions 
parallel instructions advantages 5-5 
SUBC instruction, integer division 3-6 


bank memory control logic 4-18 


bank switching 
external bus 4-15 
for Cypress Semiconductor’s CY7C185 
SRAM 4-17 
techniques 4-15 
timing for read operations 4-19 


benchmarks, for common ’C3x operations 6-78 
biquad 6-9 

bit manipulation 3-2 

bit-reversed addressing 3-5 

bit-reversed addressing, inC 5-9 


Index-1 


Index 


block 
moves 3-4 
repeat 2-18 
inaloop 2-19 
using to finda maximum 2-20 
boot 
from a byte-wide ROM A-4 
from serial port, to 8-, 16-, and 32-bit-wide 
RAM A-5 
boot loader program, ’'C32 B-1 to B-14 
flowchart B-3 
opcodes B-5 
source code description B-2 
source code listing B-6 
boot table 
C32, examples A-1 
C32 
host load 4-102 
memory configuration 4-100, 4-101 
memory considerations 4-99 
branches, delayed 2-17 
breakdown of numbers 11-10 
.bss section, linking C data objects separate 
from 5-13 to 5-15 
buffered signals 10-7 
MPSD_ 10-6 
buffering 10-5 
bulletin board service (BBS) 11-6 
Burr-Brown DSP 101/2 and 201/2, interface to 
’C3x_ 8-10 to 8-20 


Ccompiler 11-2 
C30 
power dissipation 12-1 to 12-26 
photo of Ipp for FFT 12-26 
primary bus, addressing up to 68 giga- 


words 4-107 
C31 
serial port, initializing 8-23 
timer 


initializing 8-22 


maximum timer period register value 8-22 
minimum timer period register value 8-22 


Index-2 


C32 
boot loader program B-1 to B-14 
boot table 
examples A-1 
hostload 4-102 
memory configuration 4-101 
memory considerations 4-99 
booting inaC environment 4-86 
configuration examples 
2 external memory banks 4-74 
single external memory bank 4-80 
interfacing memory to 
1 bank/2 strobes (32-bit-wide memory) 4-49 
1 bank/2 strobes address translation for data 
size equal to 16 and 32 bits 4-55 
1 bank/2 strobes address translation for data 
size equal to 16 and 8 bits 4-51 
1 bank/2 strobes address translation for data 
size equal to 32 and 8 bits 4-53 
16/8-bit memory configuration design 
examples 4-41 
32-bit memory configuration design 
examples 4-35 
logical versus physical address 4-33 
program fetch from 16-bit STRBO 
memory 4-29 
program fetch from 32-bit STRB1 
__memory 4-31 
RDY signal generation 4-57 
STRBO and STRB1 data access 4-25, 4-27 
memory, address spaces 4-69 
memory configuration, for normal program 
execution 4-100 
TMS320 tools interaction with enhanced memory 
interface 4-67 
C compiler 4-69 
C compiler and assembler switch 4-72 
configuration examples 4-74 
debugger configuration 4-73 
linker switches 4-73 


calculation of TMS320 power dissipation, photo of 
Ipp for FFT 12-26 


C-callable routines 5-2 
ceramic resonators 9-1 to 9-24 
circular addressing, FIR filters 6-7 


clock oscillator 9-1 to 9-24 
circuitry 1-3 


COFF file 
generating 4-86 
assembler 4-87 
compiler 4-87 
linker 4-88 
out file 4-90 
loading to the target system 4-91 


communications 
primary 8-25 
secondary 8-25 


companding 6-2 to 6-6 
compiler 11-2 


compression 
A-law 6-5 
u-law 6-3 


computed GOTO 2-22 


connector 
12-pin header 10-2 
mechanical dimensions 10-8 to 10-9 


context switching 2-11 
context restore for ’C3x 2-15 to 2-17 
context save for ’C3x 2-13 to 2-14 


control registers, STRBO and STRB1 4-23 
conversion, time to frequency domain 6-28 


converters 
A/D 
AD1678 8-2 
interface to the ‘C30 expansion 
bus 8-2to 8-5 
read operations timing between the C30 and 
AD1678 8-4 
Burr-Brown DSP101/2 and DSP201/2, interface 
to’C3x 8-10 to 8-20 
D/A 
interface to the C30 expansion bus 8-6 
timing diagram for write operation 8-8 


CS4215, interface to the ‘C3x 8-39 to 8-65 


current calculations 12-24 to 12-26 
average 12-25, 12-26 
data output 12-25 
processing 12-24 


Index 


data objects, linking C separate from 
-bss 5-13 to 5-15 
DATA_SECTION pragma directive 4-70 
debugger 11-3 
boot 4-91 
RAM model (linker —cr option) 4-92 
ROM model (linker -c option) 4-92 
configuration, for C32 external memory 4-73 
delayed branches 2-17 
development support 11-1 to 11-10 
development support tools 11-2 to 11-6 
bulletin board service 11-6 
code generation tools 11-2 
assembler/linker 11-2 
Ccompiler 11-2 
linker 11-2 
documentation 11-5 
hotline 11-5 
literature 11-5 
seminars 11-5 
system integration and debug tools 11-3 
debugger 11-3 
emulation porting kit (EPK) 11-4 
emulator 11-3 
evaluation module (EVM) 11-3 
simulator 11-3 
XDS510 emulator 11-3 
technical training organization (TTO) work- 
shop 11-5 
third parties 11-4 
workshops 11-5 
device 
nomenclature (TMS320) 11-10 
suffixes 11-10 
diagnostic applications 10-10 
digital-to-analog converters (DAC), interface to the 
*C30 expansion bus_ 8-6 to 8-9 
dimensions, 12-pin header 10-8 to 10-9 
division, floating-point 3-10 
DMA 
block moves 3-4 
programming hints 7-2 
setup and use examples 7-4 to 7-10 
documentation 11-5 


Index-3 


Index 


emulation porting kit(EPK) 11-4 
emulator 11-3 
cable, signal timing, MPSD 10-4 
connection to target system 10-5 to 10-7 
MPSD mechanical dimensions 10-8 to 10-9 
MPSD connector, 12-pin header 10-2 
pod 
MPSD timing 10-4 
parameters 10-4 
pod interface 10-3 
signal buffering 10-5 
enhanced memory interface, ’C32, functional 
description 4-24 
evaluation module (EVM) 11-3 
example circuit, for wait states and ready 
generation 4-14 
expansion 
A-law 6-6 
u-law 6-4 
expansion bus interface, ready 
generation 4-10 to 4-20 
functions 4-12 
extended-precision 
addition example 3-16 
arithmetic 3-16 
multiplication example 3-18 
subtract example 3-17 
external 
buses (expansion, primary) 
bank switching 4-15 
primary bus interface 4-4 
ready generation 4-10 to 4-20 
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IIR 6-9 

lattice 6-18 
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—croption 4-92, 4-96 
switches, to support C32 memory pools 4-73 

literature 11-5 

LMS algorithm filters 6-15 

logical address 4-33 

logical operations 
bit manipulation 3-2 
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16-bit dynamic 4-84 
8-bit dynamic 4-76 
8-bit static 4-78 
inC programs C-2 
banks 
address decode for multiple 4-64, 4-65 
zero-wait-state interface for 32- and 8-bit 
SRAM 4-75 
zero-wait-state interface for 32-bit SRAMs 
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